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SECRETED PROTEINS OF MYCOBACTERIUM TUBERCULOSIS AND 
THEIR USE AS VACCINES AND DIAGNOSTIC REAGENTS 

Background of the Invention 
The invention is in the field of tuberculosis and, 
specifically, reagents useful for generating immune responses 
to Mycobacterium tuberculosis and for diagnosing infection 
and disease in a subject that has been exposed to M. 
tuberculosis . 

Tuberculosis infection continues to be a world-wide 
health problem. This situation has recently been greatly 
exacerbated by the emergence of multi-drug resistant strains 
of M. tuberculosis and the international AIDS epidemic. It 
has thus become increasingly important that effective 
vaccines against and reliable diagnostic reagents for M. 
tuberculosis be produced. 

U.S. application no. 08/796,792 is incorporated herein 
by reference in it entirety. 

Summary of the Invention 

The invention is based on the discovery of a novel group 
of open reading frames (ORFs) encoding polypeptides that are 
secreted by M. tuberculosis . The invention features these 
polypeptides, functional segments thereof, DNA molecules 
encoding either the polypeptides or the functional segments, 
vectors containing the DNA molecules, cells transformed by 
the vectors, compositions containing one or more of any of 
the above polypeptides, functional segments, or DNA 
molecules, and a variety of diagnostic, therapeutic, and 
prophylactic (vaccine) methodologies utilizing the foregoing. 

Specifically, the invention features an isolated DNA 
molecule containing a DNA sequence encoding a polypeptide 
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with a first amino acid sequence that can be the amino acid 
sequence of the polypeptide MTSP1, MTSP2 , MTSP3 , MTSP4, 
MTSP5, MTSP6, MTSP7, MTSP8 , MTSP9, MTSP10, MTSP.ll, MTSP12 , 
MTSP13, MTSP14, MTSP15, MTSP16, MTSP17 , MTSP18, MTSP1 9 , 
5 MTSP20, MTSP21, MTSP22 , MTSP23, MTSP24, MTSP25 , MTSP26, 
MTSP27, MTSP28 , MTSP29, MTSP30, MTSP31 , MTSP32 , MTSP33 , 
MTSP3 4 , MTSP35, MTSP36, MTSP37 , MTSP38, MTSP39, MTSP40, 
MTSP41, MTSP42, MTSP43, MTSP4 4 , MTSP45, MTSP4 6 , or MTSP47, as 
depicted in Fig. 1, or a second amino acid sequence identical 

10 to the first amino acid sequence with conservative 

substitutions; the polypeptide has Mycobacterium tuberculosis 
specific antigenic and immunogenic properties. "Also included 
in the invention is an isolated portion of the above DNA 
molecule. The portion of the DNA molecule encodes a segment 

15 of the polypeptide shorter than the full-length polypeptide, 
and the segment has Mycobacterium tuberculosis specific 
antigenic and immunogenic properties. Other embodiments of 
the invention are vectors containing the above DNA molecules 
and transcriptional and translational regulatory sequences 

20 operationally linked to the DNA sequence, the regulatory 
sequences allow for the expression of the polypeptide or 
functional segment encoded by the DNA sequence in a cell. 
The invention encompasses cells (e.g., eukaryotic and 
prokaryotic cells) transformed with the above vectors. 

25 The invention encompasses compositions containing any of 

the above vectors and a pharmaceutically acceptable diluent 
or filler. Other compositions to be used as DNA vaccines can 
contain at least two (e.g., three, four, five, six, seven, 
eight, nine, then, twelve, fifteen or twenty) DNA sequences, 

30 each encoding a polypeptide of the Mycojbacteriujn tuberculosis 
complex or a functional segment thereof, with the DNA 
sequences being operationally linked to transcriptional and 
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translational regulatory sequences which allow for expression 
of each of the polypeptides in a cell of a vertebrate. In 
such compositions, at least one of the DNA sequences contains 

V 

the sequence of the above DNA molecules of the invention. 
p 5 The invention also features an isolated polypeptide with 

a first amino acid sequence that can be the sequence of the 
polypeptide MTSP1, MTSP2, MTSP3, MTSP4, MTSP5, MTSP6, MTSP7, 
MTSP8, MTSP9, MTSP10, MTSP11, MTSP12 , MTSP13, MTSP14, MTSP15 , 
MTSP16, MTSP17, MTSP18 , MTSP19, MTSP20, MTSP21, MTSP22, 

10 MTSP23, MTSP2 4 , MTSP2 5 , MTSP2 6 , MTSP27 , MTSP28 , MTSP29, 
MTSP30 , MTSP31, MTSP32 , MTSP33, MTSP34 , MTSP35, MTSP36, 
MTSP37 , MTSP38, MTSP39, MTSP40, MTSP41, MTSP42,' MTSP4 3 , 
MTSP44, MTSP45, MTS P4 6 , or MTSP47, as depicted in Fig. 1, or 
a second amino acid sequence identical to the first amino 

15 acid sequence with conservative substitutions. The 

polypeptide has Mycobacterium tuberculosis specific antigenic 
and immunogenic properties. Also included in the invention 
is an isolated segment of this polypeptide, the segment being 
shorter than the full-length polypeptide and having 

20 Mycobacterium tuberculosis specific antigenic and immunogenic 
properties. Other embodiments are compositions containing 
the polypeptide, or functional segment, and a 

pharmaceutically acceptable diluent or filler. Compositions 
of the invention can also contain at least two (e.g., three 

25 four, five, six, seven, eight, nine, ten, twelve, fifteen, or 
twenty) polypeptides of the Mycojbacteriuin tuberculosis 
complex, or functional segments thereof, with at least one of 
the at least two polypeptides having the sequence of one of 
the above described polypeptides of the invention. 

30 The invention also features methods of diagnosis. One 

embodiment is a method involving: (a) administration of one 
of the above polypeptide compositions to a sub j ect , suspected 
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of having or being susceptible to Mycobacterium tuberculosis 
infection; and (b) detecting an immune response in the 
subject to the composition, as an indication that the subject 
has or is susceptible to Mycobacterium tuberculosis 
5 infection. Another embodiment is a method that involves: (a) 
providing a population of cells containing CD4 T lymphocytes 
from a subject; (b) providing a population of cells 
containing antigen presenting cells (APC) expressing a major 
histocompatibility complex (MHC) class II molecule expressed 

10 by the subject; (c) contacting the CD4 lymphocytes of (a) 
with the APC of (b) in the presence of one or more of the 
polypeptides, functional segments, and or polypeptide 
compositions of the invention; and (d) determining the 
ability of the CD4 lymphocytes to respond to the polypeptide, 

15 as an indication that the subject has or is susceptible to 
Mycobacterium tuberculosis infection. Another diagnostic 
method of the invention involves: (a) contacting a 
polypeptide, a functional segment, or a 

polypeptide/functional segment composition of the invention 
20 with a bodily fluid of a subject; (b) detecting the presence 
of binding of antibody to the polypeptide, functional 
segment, or polypeptide/functional segment composition, as an 
indication that the subject has or is susceptible to 
Mycobacterium tuberculosis infection . 
25 Also encompassed by the invention are methods of 

vaccination. These methods involve administration of any of 
the above polypeptides, functional segments, or DNA 
compositions to a subject. The compositions can be 
administered alone or with one or more of the other 
30 compositions. 

As used herein, an "isolated DNA molecule" is a DNA 
which is one or both of : not immediately contiguous with one 
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or both of the coding sequences with which it is immediately 
contiguous (i.e., one at the 5! end and one at the 3' end) in 
the naturally-occurring genome of the organism from which the 
DNA is derived; or which is substantially free of DNA 
« 5 sequence with which it occurs in the organism from which the 

DNA is derived. The term includes, for example, a 
recombinant DNA which incorporated into a vector, e.g., into 
an autonomously replicating plasmid or virus, or into the 
genomic DNA of a prokaryote or eukaryote, or which exists as 
10 a separate molecule (e.g., a cDNA or a genomic fragment 
produced by PCR or restriction endonuclease treatment) 
independent of other DNA sequences. Isolated DNA also 
includes a recombinant DNA which is part of a hybrid DNA 
encoding additional M. tuberculosis polypeptide sequences. 
15 " DNA molecules" include cDNA, genomic DNA, and synthetic 

(e.g., chemically synthesized) DNA. Where single-stranded, 
the DNA molecule may be a sense strand or an antisense 
strand. 

An "isolated polypeptide" of the invention is a 
20 polypeptide which either has no naturally-occurring 
counterpart, or has been separated or purified from 
components which naturally accompany it, e.g., in M. 
tuberculosis bacteria. Typically, the polypeptide is 
considered "isolated" when it is at least 70%, by dry weight, 
25 free from the proteins and naturally-occurring organic 

molecules with which it is naturally associated. Preferably, 
a preparation of a polypeptide of the invention is at least 
80%, more preferably at least 90%, and most preferably at 
least 99%, by dry weight, the peptide of the invention. 
30 Since a polypeptide that is chemically synthesized is, by its 
nature, separated from the components that naturally 
accompany it, the synthetic polypeptide is "isolated." 
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An isolated polypeptide of the invention can be 
obtained, for example, by extraction from a natural source 
(e.g., M. tuberculosis bacteria); by expression of a 
recombinant nucleic acid encoding the polypeptide; or by 
5 chemical synthesis. A polypeptide that is produced in a 
cellular system different from the source from which it 
naturally originates is "isolated," because it will be 
separated from components which naturally accompany it. The 
extent of isolation or purity can be measured by any 

10 appropriate method, e.g., column chromatography, 

polyacrylamide gel electrophoresis, or HPLC analysis. 

The polypeptides may contain a primary amino acid 
sequence that has been modified from those disclosed herein. 
Preferably these modifications consist of conservative amino 

15 acid substitutions. Conservative substitutions typically 
include substitutions within the following groups: glycine 
and alanine; valine, isoleucine, and leucine; aspartic acid 
and glutamic acid; asparagine and glutamine; serine and 
threonine; lysine and arginine; and phenylalanine and 

20 tyrosine. 

The terms "protein" and "polypeptide" are used herein to 
describe any chain of amino acids, regardless of length or 
post-translational modification (for example, glycosylation 
or phosphorylation) . Thus, the term "Mycobacterium 

25 tuberculosis polypeptide" includes full-length, naturally 
occurring Mycobacterium tuberculosis protein, as well a 
recombinantly or synthetically produced polypeptide that 
corresponds to a full-length naturally occurring 
Mycobacterium tuberculosis protein or to particular domains 

30 or portions of a naturally occurring protein. The term also 
encompasses a mature Mycobacterium tuberculosis polypeptide 
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which has an added amino-terminal methionine (useful for 
expression in prokaryotic cells) . 

As used herein, "immunogenic" means capable of 
activating a primary or memory immune response. Immune 
5 responses include responses of CD4+ and CD8+ T lymphocytes 
and B-lymphocytes . In the case of T lymphocytes, such 
responses can be proliferative, and/or cytokine (e.g., 
interleukin(IL) -2, IL-3, IL-4, IL-5, IL-6, IL-12, IL-13, IL- 
15, tumor necrosis factor-a (TNF-a) , or interferon-y (IFN- 
10 y) ) -producing, or they can result in generation of cytotoxic 
T-lymphocytes (CTL) . B-lymphocyte responses can be those 
resulting in antibody production by the responding B 
lymphocytes . 

As used herein, "antigenic" means capable of being 

15 recognized by either antibody molecules or antigen-specific T 
cell receptors (TCR) on activated effector T cells (e.g., 
cytokine-producing T cells or CTL) . 

Thus, polypeptides that have "Mycobacterium tuberculosis 
specific antigenic properties" are polypeptides that: (a) can 

20 be recognized by and bind to antibodies elicited in response 
to Mycobacterium tuberculosis organisms or wild-type 
Mycobacterium tuberculosis molecules (e.g., polypeptides); or 
(b) contain subsequences which, subsequent to processing of 
the polypeptide by appropriate antigen presenting cells (APC) 

25 and bound to appropriate major histocompatibility complex 
(MHC) molecules, are recognized by and bind to TCR on 
effector T cells elicited in response to Mycobacterium 
tuberculosis organisms or wild-type Mycobacterium 
tuberculosis molecules (e.g., polypeptides). 

30 As used herein, polypeptides that have 91 Mycobacterium 

tuberculosis specific immunogenic properties" are 
polypeptides that: (a) can elicit the production of 
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antibodies that recognize and bind to Mycobacterium 
tuberculosis organisms or wild-type Mycobacterium 
tuberculosis molecules (e.g., polypeptides); or (b) contain 
subsequences which, subsequent to processing of the 
5 polypeptide by appropriate antigen presenting cells (APC) and 
bound to appropriate major histocompatibility complex (MHC) 
molecules on the surface of the APC, activate T cells with 
TCR that recognize and bind to peptide fragments derived by 
processing by APC of Mycobacterium tuberculosis organisms or 

10 wild-type Mycobacterium tuberculosis molecules (e.g., 

polypeptides) and bound to MHC molecules on the surface of 
the APC. The immune responses elicited in response to the 
immunogenic polypeptides are preferably protective. As used 
herein, "protective" means preventing establishment of an 

15 infection or onset of a disease or lessening the severity of 
a disease existing in a subject. "Preventing" can include 
delaying onset, as well as partiallly or completely blocking 
progress of the disease. 

As used herein, a "functional segment of a Mycobacterium 

20 tuberculosis polypeptide" is a segment of the polypeptide 
that has Mycobacterium tuberculosis specific antigenic and 
immunogenic properties. 

Where a polypeptide, functional segment of a 
polypeptide, or a mixture of polypeptides and/or functional 

25 segments have been administered (e.g., by intradermal 

injection) to a subject for the purpose of testing for a M, 
tuberculosis infection or susceptibility to such an 
infection, "detecting an immune response" means examining the 
subject for signs of a immunological reaction to the 

30 administered material, e.g., reddening or swelling of the 
skin at the site of an intradermal injection. Where the 
subject has antibodies to the administered material, the 
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response will generally be rapid, e.g., 1 minute to 24 hours. 
On the other hand, a memory or activated T cell reaction of 
pre-immunized T lymphocytes in the subject is generally 
slower, appearing only after 24 hours and being maximal at 
5 24-96 hours. 

As used herein, a "subject" can be a human subject or a 
non-human mammal such as a non-human primate, a horse, a 
bovine animal, a pig, a sheep, a goat, a dog, a cat, a 
rabbit, a guinea pig, a hamster, a rat, or a mouse. 

10 Unless otherwise defined, all technical and scientific 

terms used herein have the same meaning as commonly 
understood by one of ordinary skill in the art to which this 
invention pertains. In case of conflict, the present 
document, including definitions, will control. Preferred 

15 methods and materials are described below, although methods 

and materials similar or equivalent to those described herein 
can be used in the practice or testing of the present 
invention. Unless otherwise indicated, these materials and 
methods are illustrative only and are not intended to be 

20 limiting. All publications, patent applications, patents and 
other references mentioned herein are illustrative only and 
not intended to be limiting. 

Other features and advantages of the invention, e.g., 
methods of diagnosing or vaccinating against M. tuberculosis 

25 infection, will be apparent from the following description, 
from the drawings and from the claims. 

Brief Description of the Drawings 
Figure 1 is a depiction of the amino acid sequences of 
M. tuberculosis polypeptides MTSP1-MTSP47 . 
30 Figure 2 is a depiction of the nucleotide sequences of 

the coding regions (mtspl-mtsp47 ) encoding MTSP1— MTSP47 . 
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Fig. 3A is a line graph showing the distribution of 
SPSCAN scores for the 3924 M. tuberculosis protein sequences 
obtained from the Sanger Center website. 

Fig. 3B is a line graph showing the distribution of 
5 SignalP scores for the 3924 protein sequences obtained from 
the Sanger Center website. 

Fig. 3C is a "dot plot" of SignalP scores versus SPSCAN 
scores for the individual 3924 protein sequences obtained 
from the Sanger Centre website. 
10 Fig. 4 is an enlargement of Fig. 3C. 

Detailed Description 
It is generally believed that proteins that are actively 
secreted by bacteria, especially intracellular bacteria 
(e.g., Salmonella typhi and M. tuberculosis) , are effective 

15 as antigens that are capable of inducing protective immunity 
to the organism. A number of open reading frames (ORF) , 
(i.e., DNA sequences that encode a protein) were predicted 
from the genomic sequence of M. tuberculosis [Cole et al . 
(1998) Nature 393:537-54 4]. The instant invention is based 

20 on the identification of a number of ORFs of this group that 
encode secreted polypeptides (see Example 1) . The 
polypeptides encoded by the ORFs thus identified are 
designated M. tuberculosis Secreted Polypeptides (MTSP) and 
the DNA sequences encoding them are designated mtsp. Because 

25 they are secreted, we believe that the MTSP are both 

immunogenic and antigenic. The immune responses that they 
induce in subjects exposed to them are preferably also 
protective against M. tuberculosis infection in the subjects. 
The amino acid sequences of MTSP1-MTSP4 4 are shown in Fig. 1 

30 and the nucleotide sequences of mtspl-mtsp44 are shown in 
Fig. 2. 



-10- 



WO 00/66143 




PCT/US00/12197 



The invention encompasses: (a) isolated DNA molecules 
containing sequences (e.g., mtspl-mtsp47 ) encoding 
polypeptides (e.g., MTSP1-MTSP4 7 ) secreted by M. tuberculosis 
and isolated portions of such DNA molecules that encode 
5 polypeptide segments having antigenic and immunogenic 

properties (i.e., functional segments); (b) the secreted 
polypeptides themselves (e.g., MTSP1-MTSP4 7 ) and functional 
segments of them; (c) antibodies (including antigen binding 
fragments, e.g., F(ab')2/ Fab, Fv, and single chain Fv 

10 fragments of such antibodies) that bind to the MTSP1-MTSP4 7 
polypeptides and functional segments; (d) nucleic acid 
molecules (e.g., vectors) containing and capable of 
expressing one or more of the DNA molecules containing the 
mtspl-mtsp47 sequences and portions of DNA molecules; (e) 

15 cells (e.g., bacterial, yeast, insect, or mammalian cells) 
transformed by such vectors; (f) compositions containing 
vectors encoding one or more M. tuberculosis polypeptides (or 
functional segments) including both the MTSP1-MTSP47 
polypeptides (or functional segments thereof) and previously 

20 described M. tuberculosis polypeptides such as ESAT-6, 14 kDa 
antigen, MPT63, 19 kDa antigen, MPT64, MPT51, MTC28, 38 kDa 
antigen, 45/47 .kDa antigen, MPB70, Ag85 complex, MPT53, and 
KatG (see also U.S. application no. 08/796,792); (g) 
compositions containing one or more M. tuberculosis 

25 polypeptides (or functional segments) , including both the 
polypeptides of the invention and previously described M. 
tuberculosis polypeptides such as those described above; (h) 
compositions containing one or more of antibodies described 
in (c) ; (i) methods of diagnosis involving either (1) 

30 administration (e.g., intradermal injection) of the MTSP1- 
MTSP4 4 polypeptides of the invention, functional segments 
thereof, or mixtures of one more such polypeptides and/or 
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functional segments to a subject suspected of having or being 
susceptible to M. tuberculosis infection, (2) in vitro 
testing of lymphocytes from such a subject for responsiveness 
to the MTSP1-MTS P4 7 polypeptides, functional segments 
5 thereof, or the above mixtures, or (3) testing of a bodily 

fluid (e.g./ blood, saliva, plasma, serum, urine, or semen or 
a lavage such as a bronchoalveolar lavage, a vaginal lavage, 
or lower gastrointestinal lavage) for antibodies to the 
MTSP1-MTSP47 polypeptides or functional segments thereof, or 

10 the above-described mixtures; (j) methods of vaccination 

involving administration to a subject of the compositions of 
either (f ) , (g) , (h) or a combination of any two or even all 
3 compositions. 

With respect to diagnosis, purified M. tuberculosis 

15 proteins, functional segments of such proteins, or mixtures 
of proteins and/or the functional fragments have the 
advantage of discriminating infection by M. tuberculosis from 
infection by other bacteria, and in particular, non- 
pathogenic mycobacteria. Of particular benefit in such 

20 assays are proteins encoded by genes present in M. 
tuberculosis, and possibly other members of the 
M . tuberculosis complex (e.g., M. tuberculosis, M. bovis, M. 
microti, and M. africanum) , but absent from the Bacille 
Calmette-Guerin (BCG) attenuated strain of M. bovis which has 

25 been commonly used for vaccination. Use of such proteins 

(e.g., the MTSP16 protein whose sequence is shown in Fig. 1) 
for diagnosis allows for discrimination between infection by 
M. tuberculosis and vaccination with BCG. Furthermore, 
compositions containing the M. tuberculosis proteins, 

30 functional segments of them, or mixtures of the proteins 

and/or the functional segments allows for improved quality 
control since "batch-to-batch" variability is greatly reduced 
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in comparison to complex mixtures such as purified protein 
derivative (PPD) of tuberculin. 

Where vaccination is performed with nucleic acids both 
in vivo and ex vivo methods can be used. In vivo methods 
5 involve administration of the nucleic acids themselves to the 
subject and ex vivo methods involve obtaining cells (e.g., 
bone marrow cells or fibroblasts) from the subject, 
transducing the cells with the nucleic acids, preferably 
selecting or enriching for successfully transduced cells, and 

10 administering the transduced cells to the subject. 

Alternatively, the cells that are transduced and administered 
to the subject can be derived from another subject. Methods 
of vaccination and diagnosis are described in greater detail 
in U.S. application no. 08/796,792 which is incorporated 

15 herein by reference in its entirety. 

The following example is meant to illustrate, not limit 
the invention. 

Example 1 . Computer Aided Identification of M. tuberculo sis 

Secreted Proteins 

20 Software . 

The software used to manipulate and analyze protein 
sequences was available from public web servers or was part 
of the Genetics Computer Group (GCG) package [Wisconsin 
Package Version 9.1, Genetics Computer Group (GCG), Madison, 

25 Wise.]. Customized C-Shell scripts were used to automate 

some of the tasks or to extract selected information from the 
output of some of the programs. Signal peptides were 
predicted with SPSCAN, which is part of the GCG package, and 
SignalP, a program originating from the Center for Biological 

30 Sequence Analysis at the Technical University of Denmark, 
Lyngby, Denmark and currently available on the Internet at 
http: //www. cbs .dtu.dk/services/SignalP. Putative 
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transmembrane segments were identified with the program 
TMpred and prokaryotic membrane lipoprotein lipid attachment 
sites with the program PrositeScan, both programs originating 
from the Bioinf ormatics Group at the Swiss Institute for 
5 Experimental Cancer Research in Epalinges, Switzerland, and 
currently available on the Internet at http://www.isrec.isb- 
sib. ch/sof tware/TMPRED_f orm . html and http://www.isrec.isb- 
sib . ch/sof tware/PSTSCAN_f orm. html, respectively. Protein 
similarity and relatedness was established with GAP and 

10 PILEUP, both in the GCG package, Blast originating from the 

National Center for Biotechnology Information of the National 
Institutes for Health, Bethesda, MD and currently available 
on the Internet at http://www.ncbi.nlm.nih.gov/BLAST/, and 
AllAll originating from the Swiss Institute of Technology, 

15 Zurich, Switzerland, and currently available on the Internet 
at http : cbrg . inf . ethz . ch/subsect ion3_l_l . html . 

Prediction of M. tuberculosis proteins with signal peptides 

The amino acid sequences of the 3924 proteins predicted 
by the analysis of the M. tuberculosis genomic sequence have 

20 been made available by the Sanger Centre, Cambridge, England, 
and were downloaded from the current Sanger Center website 
[http: //www. sanger.ac.uk/Projects/M_tuberculosis/] . Segments 
containing the first 70 amino acids of each predicted protein 
were analyzed by a system of our own design utilizing two 

25 different computer programs (SPSCAN and SignalP) designed to 
predict the occurrence of signal peptides. We concluded that 
combining the output from the two programs would increase the 
reliability of the selection. Both programs can detect 
signal peptides in polypeptides from eukaryotic and 

30 prokaryotic organisms, including gram-positive and gram- 
negative bacteria. To analyze the M. tuberculosis proteins 
the gram-positive mode was used. We performed an analysis 
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with SPSCAN allowing only one prediction per protein, setting 
the minimum score threshold at -10, both in the standard and 
the adjusted modes. In the adjusted mode, signal peptides 
longer than a certain threshold value are penalized. We 
5 found that the correlation between the scores obtained with 
SPSCAN in the standard and adjusted modes increased with the 
value of the score, i.e., signal peptides that received high 
scores in standard mode also had high scores in the adjusted 
mode. We determined to use only the adjusted mode in 

10 subsequent steps. 

To define cutoff values for the scores obtained with 
SPSCAN (in adjusted mode) and SignalP we took into account 
the following factors: (a) SignalP scores above 0.34 are 
generally considered significant; (b) the analysis of 

15 Haemophilus influenzae genome with SignalP yielded the 

prediction that about 10% of the encoded proteins contain a 
signal peptide; and (c) the average scores of thirteen known 
secreted or membrane-associated M. tuberculosis antigens was 
9.11 (standard deviation (SD)=1.8) and 0.55 (SD=0.15), as 

20 calculated as above . utilizing SPSCAN and SignalP, 
respectively (Table 1) . 

Of the 3924 M. tuberculosis protein sequences downloaded 
from the Sanger Centre website, about 10% of the sequences 
had SPSCAN scores equal or higher than 8 (Fig. 3A) and about 

25 10% of the sequences had SignalP scores equal or higher than 
0.4 (Fig 3B) . We tentatively adopted these score values as 
"cutoffs" and we used the cutoffs to construct a list of 
proteins that were likely to be either secreted or exposed at 
the bacterial cell surface. This list included those 

30 proteins with SPSCAN scores higher than 8 and SignalP scores 
higher than 0.4. We refer to this group of proteins (208 
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entries, about 5% of the proteome) as the "Top208" group 
(Fig . 3C and Fig . 4 ) . 



Table 1. SPCAN and SignalP Scores of Known Secreted or 
Membrane Associated M. tuberculosis Polypeptide Antigens 



Polypeptide 
Antigens 


Alternative Names 


SPSCAN Score 


SignalP Score 


19 kDa 




5.9 


0. 331 


OO una 
JO KUd 


antigen 5 


D . O 


0 . 505 


45/47 kDa 




11.2 


0. 627 


MPT4 4 


Ag85A, P32, FbpA 


9.2 


0. 425 


MPT4 5 


Ag85C, FbpC 


10.1 


0.496 


MPT51 




11 


0.758 


MPT53 




9.4 


0.581 


MPT59 


Ag85B, a antigen, 
Ag 6 , FbpB 


9.7 


0. 629 


MPT 6 3 




8 


0.57 


MPT 6 4 




10.2 


0.83 


MPT70 




9 


0.459 


MPT83 




7.1 


0.298 I 


MTC28 




11.4 


0.7 



Prediction of M. tuberculosis secreted proteins 

A signal peptide may target a protein to the membrane 
but does not define a secreted protein, because additional 
transmembrane segments within the mature protein molecule can 
be present. In addition, lipoproteins are also targeted to 
the membrane by a signal peptide, but are not all secreted 
since cleavage of the signal peptide is coupled with the 
attachment of an acyl glycerol group that anchors the protein 
to the membrane. In light of these considerations and the 
fact that SignalP is not designed to differentiate 
lipoprotein signal peptides from secretory signal peptides, 
we believe that the Top208 group contains lipoproteins and 
proteins with multiple transmembrane segments, in addition to 
secreted proteins. 

The number of putative transmembrane segments and the 
presence of lipoprotein lipid attachment sites were assessed 
by analyzing the Top208 proteins with TMpred and PrositeScan. 



10 



15 
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TMpred identifies putative transmembrane segments by 
comparing a query amino acid sequence with a database of 
amino acid sequences of experimentally defined transmembrane 
segments. Scores higher than 500 are considered significant. 
5 PrositeScan compares query amino acid sequences against the 
Prosite database of protein motifs. The prokaryotic 
lipoprotein lipid attachment site motif is entry number 
PS00013. Our methodology identified a class of secreted 
proteins (the "Top208-TM1 " group that included MTSP1-MTSP4 4 ) 

10 which were characterized by a single transmembrane segment 
(with score higher than 500) in the position predicted for 
the signal peptide and in which no lipoprotein motifs were 
identified. Other proteins had additional transmembrane 
segments with scores higher than 500, had lipoprotein motifs, 

15 or were excluded from the analysis because they belonged to 
the PE/PPE/PGRS families of proteins [Cole et al . , 1998] and 
their biased amino acid composition made it difficult to 
obtain reliable results with SPSCAN, SignalP, or TMpred. A 
summary of the characteristics of the proteins we assigned to 

20 the Top208-TM1 group is presented in Table 2 and data 

regarding proteins MTSP1-MTSP4 7 are presented in Table 3. 
The amino acid sequences of the proteins are listed in Fig. 1 
and the nucleotide sequences of ORF encoding them (mtspl- 
mtsp47) are listed in Fig. 2. 



-17- 



WO 00/66143 



PCT/US00/12197 



Table 2. Features defining the M. tuberculosis proteins 
included in the Top208-TM1 group. 



1. A signal peptide with score higher than 0.4 was predicted 
5 with SignalP in the first 70 amino acids. 

2. A signal peptide with score higher than 8 was predicted 
with SPSCAN in the first 70 amino acids. 

3. A single transmembrane segment, with a score greater than 
500 and coinciding approximately with the putative signal 

10 peptide, was predicted by TMpred. 

4 . No lipoprotein lipid attachment sites were identified with 
PrositeScan . 
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Table 3. Proteins included in the Top208-TM1 group. 



Protein 



MTSP20 
MTSP21 
MTSP23 
MTSP16 



MTSP24 
MTSP14 



MTSP13 
MTSP22 



MTSP2 5 



MTSP27 

MTSP11 

MTSP26 

MTSP12 

MTSP8 

MTSP10 

MTSP28 

MTSP9 

MTSP29 

MTSP2 

MTSP4 

MTSP17 

MTSP3 

MTSP18 

MTSP6 

MTSP7 

MTSP31 

MTSP30 

MTSP1 

MTSP15 

MTSP32 

MTSP19 

MTSP5 



MTSP33 
MTSP41 



MTSP38 
MTSP35 



MTSP34 
MTSP36 
MTSP42 
MTSP44 
MTSP37 
MTSP40 
MTSP43 
MTSP39 
MTSP45 
MTSP46 
MTSP47 



No. of 

Amino 

Acids 



130 
109 
114 
126 



125 
144 



157 
124 



155 



233 
233 
382 
214 
158 
155 
295 
241 
380 
111 
177 
219 
282 
220 
219 
136 
457 
286 
104 
134 
449 
169 
568 



113 
112 



17 3 
408 



149 
168 
521 
149 
228 
231 
137 
509 
145 
143 
171 



SPSCAN 
Score 



12.4 
8.4 
10.2 
9.2 



11.4 
8.9 



10 
8.6 



9.5 



13.8 

10.9 

8.3 

12.6 

9.1 

8.8 

14.8 

10 

12.4 

10.6 

8.7 

8.9 

11.5 

8.8 

8.4 

11.7 

9.1 

8.3 

8.2 

10 

8.8 

10.5 

9.9 



11.9 
12 



10.5 
8.8 



13.7 

11.3 

8.4 

11 

9.4 

9.2 

8.2 

8.6 

8.4 

8.5 

8.3 



SPSCAN 
Sequence 



1-32 
1-22 
1-34 
1-28 



1-35 
1-34 



1-32 
1-30 



35-49 



1-29 

1-32 

1-34 

1-28 

1-33 

15-45 

1-31 

1-22 

1-27 

1-28 

1-25 

1-29 

1-32 

38-68 

1-34 

1-24 

1-18 

15-37 

1-28 

1-21 

1-23 

28-53 

1-31 



1-25 
1-33 



1-28 
1-33 



1-23 
1-28 
1-34 
1-30 
1-23 
1-30 
1-36 
1-35 
1-46 
1-27 
1-35 



SignalP 
Score 



0.672 
0.631 
0.592 
0.557 



0.73 
0.584 



0.753 
0.592 



0.842 



0.787 

0.779 

0.721 

0.71 

0.695 

0.669 

0.667 

0-. 635 

0.621 

0.579 

0.578 

0.543 

0.538 

0.537 

0.537 

0.53 

0.494 

0.469 

0.466 

0.458 

0.444 

0.438 

0.432 



0.873 
0.663 



0.697 
0.616 



0.888 

0.824 

0.679 

0.661 

0.598 

0.55 

0.485 

0.413 

0.412 

0.555 

0.424 



SignalP 
Sequence 



1-32 
1-22 
1-34 
1-36 



1-35 
1-34 



1-32 
1-30 



1-49 



1-29 
1-32 
1-34 
1-28 
1-30 
1-45 
1-31 
1-22 
1-27 
1-28 
1-24 
1-29 
1-32 
1-68 
1-34 
1-24 
1-25 
1-37 
1-28 
1-56 
1-23 
1-53 
1-31 



1-25 
1-3 



1-28 
1-33 



1-23 
1-27 
1-34 
1-30 
1-23 
1-30 
1-37 
1-38 
1-62 
1-66 
1-30 



SPSCAN sequence and SignalP sequence show the sequence, in terms of amino 
acid residue numbers, included in the signal peptide predicted by SPSCAN 
5 and SignalP, respectively. 
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Table 4 . Presence mtsp coding regions in various strains of 
Mycobacterium tuberculosis . 
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The inventors have found, by standard DNA hybridization. 
Southern blotting techniques using the indicated coding 
regions as probes and DNA isolated from the indicated strains 
5 of Mycobacteria, that some of the coding regions are specific 
for the M. tuberculosis complex. (Table 4) 

Although the invention has been described with reference 
to the presently preferred embodiment, it should be 
understood that various modifications can be made without 
10 departing from the spirit of the invention. Accordingly, the 
invention is limited only by the following claims. 
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What is claimed is: 

1 1 . An isolated DNA molecule comprising a DNA sequence 

2 encoding a polypeptide with a first amino acid sequence 

3 selected from the group consisting of the amino acid 

4 sequences of the polypeptides MTSP1, MTSP2, MTSP3, MTSP4, 

5 MTSP5, MTSP6, MTSP7, MTSP8, MTSP9, MTSP10, MTSP11, MTSP12 , 

6 MTSP13, MTSP14 , MTSP15, MTSP16, MTSP17, MTSP18 , MTSP19, 

7 MTSP2 0 , MTSP21, MTSP22 , MTSP2 3 , MTSP24, MTSP25 , MTSP26, 

8 MTSP27, MTSP28 , MTSP29, MTSP30 , MTSP31 , MTSP32, MTSP33, 

9 MTSP34, MTSP35, MTSP36, MTSP37, MTSP38, MTSP39, MTSP40, 

10 MTSP41, MTSP42, MTSP4 3 , MTSP44, MTSP45, MTSP4 6 , and MTSP47 as 

11 depicted in Fig. 1, 

12 or a second amino acid sequence identical to said first 

13 amino acid sequence but with conservative substitutions, 

14 wherein said polypeptide has Mycobacterium tuberculosis 

15 specific antigenic and immunogenic properties. 

1 2 . An isolated portion of the DNA molecule of claim 1, 

2 said portion encoding a segment of said polypeptide shorter 

3 than the full-length polypeptide, said segment having 

4 Mycojbacteriujn tuberculosis specific antigenic and immunogenic 

5 properties. 

1 3. A vector comprising: 

2 (a) the DNA molecule of claim 1; and 

3 (b) transcriptional and translational regulatory 

4 sequences operationally linked to said DNA sequence, said 

5 regulatory sequences allowing for expression of the 

6 polypeptide encoded by said DNA sequence in a cell. 

1 4. A vector comprising: 

2 (a) the DNA molecule of claim 2; and 
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3 (b) transcriptional and translat ional regulatory 

4 sequences operationally linked to said DNA sequence, said 

5 regulatory sequences allowing for expression of the 

6 polypeptide encoded by said DNA sequence in a cell. 

1 5. A cell transformed with the vector of claim 3. 

1 6. A cell transformed with the vector of claim 4. 

1 7. A composition comprising the vector of claim 3 and a 

2 pharmaceutical^ acceptable diluent or filler. 

1 8. A composition comprising the vector of claim 4 and a 

2 pharmaceutically acceptable diluent or filler. 

1 9. A composition for use as a DNA vaccine, said 

2 composition comprising at least two DNA sequences, each 

3 encoding a polypeptide of the Mycobacterium tuberculosis 

4 complex or a functional segment thereof, said DNA sequences 

5 being operationally linked to transcriptional and 

6 translational regulatory sequences which allow for expression 

7 of each said polypeptide in a cell of a vertebrate, 

8 wherein at least one of said DNA sequences is the 

9 sequence of claim 1. 

1 10. A composition for use as a DNA vaccine, said 

2 composition comprising at least two DNA sequences, each 

3 encoding a polypeptide of the Mycobacterium tuberculosis 

4 complex or a functional segment thereof, said DNA sequences 

5 being operationally linked to transcriptional and 

6 translational regulatory sequences which allow for expression 

7 of each said polypeptide in a cell of a vertebrate, 

8 wherein at least one of said DNA sequences is the 

9 sequence of claim 2. 
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1 

2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 

1 
2 
3 
4 

1 

2 
3 

1 
2 
3 

1 
2 
3 
4 



11. An isolated polypeptide with a first amino acid 
sequence selected from the group consisting of the sequences 
of the polypeptides MTSP1, MTSP2, MTSP3, MTSP4, MTSP5, MTSP6, 
MTSP7, MTSP8, MTSP9, MTSP10, MTSP11, MTSP12 , MTSP13 , MTSP1 4 , 
MTSP15, MTSP16, MTSP17 , MTSP18 , MTSP19, MTSP20, MTSP2 1 , 
MTSP22, MTSP23, MTSP24 , MTSP25, MTSP26, MTSP27, MTSP28, 
MTSP2 9 , MTSP30, MTSP31, MTSP32, MTSP33, MTSP34, MTSP35 , 
MTSP36, MTSP37 f MTSP38, MTSP39, MTSP40, MTSP4 1 , MTSP42, 
MTSP43, MTSP4 4 , mtsp45, mtsp46, and MTSP47 as depicted in 
Fig. 1, 

or a second amino acid sequence identical to said first 
amino acid sequence but with conservative substitutions , 

wherein said polypeptide has Mycobacterium tuberculosis 
specific antigenic and immunogenic properties. 

12. An isolated segment of the polypeptide of claim 11, 
said segment being shorter than the full-length polypeptide 
and having Mycobacterium tuberculosis specific antigenic and 
immunogenic properties. 

13. A composition comprising the polypeptide of claim 

11, or a functional segment thereof, and a pharmaceutically 
acceptable diluent or filler. 

14. A composition comprising the polypeptide of claim 

12, or a functional segment thereof, and a pharmaceutically 
acceptable diluent or filler. 

15. A composition comprising at least two polypeptides 
of the Mycobacterium tuberculosis complex, or functional 
segments thereof, wherein at least one of said at least two 
polypeptides is the sequence of claim 1. 



4 
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1 16. A composition comprising at least two polypeptides 

2 of the Mycobacterium tuberculosis complex, or functional 

3 segments thereof, wherein at least one of said at least 

4 .polypeptides is the segment of claim 2. 

1 17. A method of diagnosis comprising: 

2 (a) administration of the composition of claim 13 to a 

3 subject suspected of having or being susceptible to 

4 Mycobacterium tuberculosis infection; and 

5 (b) detecting an immune response in said subject to 

6 said composition, as an indication that said subject has or 

7 is susceptible to Mycobacterium tuberculosis infection. 

1 18. A method of diagnosis comprising: 

2 (a) administration of the composition of claim 14 to a 

3 subject suspected of having or being susceptible to 

4 Mycobacterium tuberculosis infection; and 

5 (b) detecting an immune response in said subject to 

6 said composition, as an indication that said subject has or 

7 is susceptible to Mycobacterium tuberculosis infection. 

1 19. A method of diagnosis comprising: 

2 (a) administration of the composition of claim 15 to a 

3 subject suspected of having or being susceptible to 

4 Mycobacterium tuberculosis infection; and 

5 (b) detecting an immune response in said subject to 

6 said composition as an indication that said subject has or is 

7 susceptible to Mycobacterium tuberculosis infection. 

1 20. A method of diagnosis comprising: 

2 (a) administration of the composition of claim 16 to a 

3 subject suspected of having or being susceptible to 

4 Mycobacterium tuberculosis infection; and 



-26- 



WO 00/66143 




PCT/US00/12197 



5 detecting an immune response in said subject to 

6 said composition as an indication that said subject has or is 

7 susceptible to Mycobacterium tuberculosis infection. 

1 21. A method of diagnosis comprising: 

2 (a) providing a population of cells comprising CD4 T 

3 lymphocytes from a subject; 

4 (b) providing a population of cells comprising antigen 

5 presenting cells (APC) expressing a major histocompatibility 

6 complex (MHC) class II molecule expressed by said subject; 

7 (c) contacting the CD4 lymphocytes of (a) with the APC 

8 of (b) in the presence of the polypeptide of claim 1; and 

9 (d) determining the ability of said CD4 lymphocytes to 

10 respond to said polypeptide, as an indication that said 

11 subject has or is susceptible to Mycobacterium tuberculosis 

12 infection. 

1 22. A method of diagnosis comprising: 

2 (a) providing a population of cells comprising CD4 T 

3 lymphocytes from a subject; 

4 (b) providing a population of cells comprising antigen 

5 presenting cells (APC) expressing at least one major 

6 histocompatibility complex (MHC) class II molecule expressed 

7 by said subject; 

8 (c) contacting the CD4 lymphocytes of (a) with the APC 

9 of (b) in the presence of the segment of claim 2; and 

10 (d) determining the ability of said CD4 lymphocytes to 

11 respond to said polypeptide, as an indication that said 

12 subject has or is susceptible to Mycobacterium tuberculosis 

13 infection. 

1 23. A method of diagnosis comprising: 

2 (a) providing a population of cells comprising CD4 T 

3 lymphocytes from a subject; 
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4 (b) providing a population cf cells comprising antigen 

5 presenting cells (APC) expressing at least one major 

6 histocompatibility complex (MHC) class II molecule expressed 

7 by said subject; 

8 (c) contacting the CD4 lymphocytes of (a) with the APC 

9 of (b) in the presence of the composition of claim 15; and 

10 (d) determining the ability of said CD4 lymphocytes to 

11 respond to said polypeptide, as an indication that said 

12 subject has or is susceptible to Mycobacterium tuberculosis 

13 infection. 

1 24. A method of diagnosis comprising: 

2 (a) providing a population of cells comprising CD4 T 

3 lymphocytes from a subject; 

4 (b) providing a population of cells comprising antigen 

5 presenting cells (APC) expressing at least one major 

6 histocompatibility complex (MHC) class II molecule expressed 

7 by said subject; 

8 (c) contacting the CD4 lymphocytes of (a) with the APC 

9 of (b) in the presence of the composition of claim 16; and 

10 (d) determining the ability of said CD4 lymphocytes to 

11 respond to said polypeptide, as an indication that said 

12 subject has or is susceptible to Mycobacterium tuberculosis 

13 infection. 

1 25. A method of diagnosis comprising: 

2 (a) contacting the polypeptide of claim 11 with a bodily 

3 fluid of a subject; 

4 (b) detecting the presence of binding of antibody to 

5 said polypeptide, as an indication that said subject has or 

6 is susceptible to Mycobacterium tuberculosis infection. 

1 26. A method of diagnosis comprising: 
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2 (a) contacting the segment of claim 12 with a bodily 

3 fluid of a subject; 

4 (b) detecting the presence of binding of antibody to 

5 said polypeptide, as an indication that said subject has or 

6 is susceptible to Mycobacterium tuberculosis infection. 

1 27. A method of diagnosis comprising: 

2 (a) contacting the composition of claim 15 with a bodily 

3 fluid of a subject; 

4 (b) detecting the presence of binding of antibody to 

5 said composition, as an indication that said subject has or 

6 is susceptible to Mycobacterium tuberculosis infection. 

1 28. A method of diagnosis comprising: 

2 (a) contacting the composition of claim 16 with a bodily 

3 fluid of a subject; 

4 (b) detecting the presence of binding of antibody to 

5 said composition, as an indication that said subject has or 

6 is susceptible to Mycobacterium tuberculosis infection* 

1 29. A method of vaccination comprising administration 

2 of the composition of claim 7 to a subject. 

1 30. A method of vaccination comprising administration 

2 of the composition of claim 8 to a subject. 

1 31. A method of vaccination comprising administration 

2 of the composition of claim 9 to a subject. 

1 . 32. A method of vaccination comprising administration 

2 of the composition of claim 10 to a subject. 

1 33. A method of vaccination comprising administration 

2 of the composition of claim 13 to a subject. 
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1 34. A method of vaccination comprising administration 

2 of the composition of claim 14 to a subject. 

1 35. A method of vaccination comprising administration 

2 of the composition of claim 15 to a subject. 

1 36. A method of vaccination comprising administration 

2 of the composition of claim 16 to a subject. 
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FIG. 1 

SlfioFGVSAVAAAAIGIGAGSGIAAAFDGEDEVTGPDADRARAAAVQAVPGGTAGEVE 

^?geS?gvuvtrpdgtrvevhldrdfrvldtepadgdgg* 



mp^^^TALS AGVGAVAMS LTVGAG VAS AD PVDAV I NTTCNYGQVVAALNATD PG AAAQFN 

aspvaSsyl^f^pppqRAamaaqlqavpg 



pqsg^gsggagsggggggdgpvepsparpmppgfirlap* 
dgSdsgfypvatldiprdaqtavnas* 

FFHTTDGGPTAGC VAI DDATLVQ 1 1 RVJLRPGAVI AI AK* 

MTSP7 Tnr i aP vArAr)PORYDGDVPGMNYDASLGAPCSSWERFIFGRGPSG 
QPGWFTGAGFFPPEP* 



1/26 



WO 00/66143 PCT/US00/12197 

FIG. 1 (continued) 

MTSP8 

MGELRLVGGVLRVLVWGAVFDVAVLNAGAASADGPVQLKSRLGDVCLDAPSGSWFSPLV 
INPCNGTDFQRWNLTDDRQVESVAFPGECVNIGNALWARLQPCVNWISQHWTVQPDGLVK 
SDLDACLTVL.GGPDPGTWVSTRWCDPNAPDQQWDSVP* 

MTSP9 

i^PAMTARSVVLSVLLGAHPAWATASELIQLTADFGIKETTLRVALTRMVGAGDLVRSADG 
YRLSDRLLARQRRQDEAMRPRTRAWHGNWHMLIVTSIGTDARTRAALRTCMHHKRFGELR 
EGVWMRPDNLDLDLESDVAARVRMLTARDEAPADLAGQLWDLSGWTEAGHRLLGDMAAAT 
DMPGRFWAAAMVRHLLTDPMLPAELLPADWPGAGLRAAYHDFATAMAKRRDATQLLEVT 



^fe^NASGSVLDMTSVRTVPSAVALVTFAGAALSGVIPAIARADPVGHQVTYTVTTTS 
DLMANIRYMSADPPSMAAFNADSSKYMITLHTPIAGGQPLVYTATLANPSQWAIVTASGG 

LRWPEFHCEIVVDGQWVSQDGGSGVQCSTRPW* 

SIIkiatafktatfalaagavalglaspadaaagtmygdpaaaakywrqqtyddcvlms 

^ADVIGOVTGREPSERAI I KVAQSTPSWHPGS I YTKPADAEHPNSGMGTSVADI PTLLA 

^SSvitdedhatatgvatgmaaleqylgsghavivsinaemiwgqpveetdsagnpr 
sdi^v^gvdtengivhlndsgtptgrdeqipmetfveawatshdfmavtt* 

MGVTA^VVGVAACGLSIjAVLiAAAPTAGAEPTGALPPMTSSGSGPVIGDGDAAIjRQRISQQ 
f?SFG^PT?QE^GSDAAQFITAAAAVADRDVASVFLPLQRVLGCQQNTAGSGAGFGARA 
YRRTDGQWGGAMLVVAKSTVSDVDALKACVKSGWRKATAGTPTSMCNNGWTY 
GEEG YF VLLAGTAS DFC S APNANYRTTAS S W PG * 

SlflpAPSPAAAFAVAGLILAGWAGSVGLAGADPEPAPTPKTAIDSDGTYAVGIDIAPGT 

yssagpvgdg^cywkrVignpdgalidnalskkpqvvtieptdkafk^ 
aapagvpgpeagaqlqnqlg i lngllgptggrvpqp * 



MTTNLRRRTAMAAAGLGAALGLGILLVPTVDAHLANGSMSEVMMSEIAGLPIPPIIHYGA 
RRAAEDDAVNRLEGGR I VNWACN* 



^f^AWLIALWAEHVHHDAAADWLMASDTGFATC^ 

^AVQCTSRHEFWPDALSFAGVEVAGWGHRQVTDAYIAQLARSHDGQLATLDSGLAH 
LHGDVAVLI PTTT* 



2/26 



WO 00/66143 ^ PCT/US00/12197 

FIG. 1 (continued) 

MTSP1 6 

VOROSLMPQQTLAAGVFVGALLCGWTAAVPPHARADWAYLVTWTVRPGYNFANM 
SYGHGLCEKVSRGRPYAQIIADVKADFDTRDQYQASYLLSQAVNELCPALIWQLRNSAVD 

NRRSG* 
MTSP17 

vrsylIrieiadrpgslgslavalgsvgadilsldvvergngyaiddlvvelppgampdt 

T TTAAEALNGVRVDSVRPHTGLLEAHRELELLDHVAAAEGATARLQVLVNEAPRVLRVSW 
CTVLRSSGGELHRLAGSPGAPETRANSAPWLPIERAAALDGGADWVPQAWRDMDTTMVAA 
PLGDTHTAWLGRPGPEFRPSEVARLGYLAGIVATMLR* 

MPDGEOSQPPAQEDAEDDSRPDAAEAAAAEPKSSAGPMFSTYGIASTLLGVLSVAAVVIjG 
AMTWSAHRDDSGERTYLTRVMLTAAEWTAVLINMNADNIDASLQRLHDGTVGQLNTDFDA 
WQPYRQWEKLRTHSSGRIEAVAIDTVHRELDTQSGAARPWTTKLPPFATRTDSVLLV 

AT S VS ENAGAKPQTVHWNLRLDVS DVDGKLM I SRLESIR* 



^HIlAAGLTAAAAIGAAAAGVTSIMAGGPWYQMQPWFGAPLPLDPASAPDVPTAA 
OLT^LLNSLADPNVSFANKGSLVEGGIGGTEARIADHKLKKAAEHGDLPLSFSVTNIQPA 
AAG S ATAD VS VS GP KLS S P VTQNVT F VNQGGWML SRAS AME LLQAAGN * 

MNLRRHOTLTLRLLAAS AG I LS AAAFAAPAQANPVDDAF I AAIjNNAGVNYGDPVDAKALG 
QSVCP?LAEPGGSFNTAVASVVARAQGMSQDMAQTFTS I AI SMYCPSVMADVASGNLPALi 

pdmpglpgs* 

Srwstllsiplmiglavpahagpsgddavflasleragityshpdqaiasgkavcalve 
sgeIgSvvnelrtrnpgfsmdgcckfaaisahvycphqitktsvsak* 

S?IttIlrasaglvagmamaaitlapgaraetgeqfpgdgvflvgtdiapgtyrtegpsn 
p^Jwgr^selstcswsthsapevsnenivdtntsmgpmswipptv^ 

RIS* 

StUlIpriiaafttavgaaaiglavatagtagantkdeafiaqmesigvtfsspqvatq 

^lQLVC^^SGETGTEIAEEVIjSQTNIjTTKQAAYFVVDATKAYCPQYASQLT* 



3/26 



WO 00/66143 W W PCT/US00/12197 

FIG. 1 (continued) 

MTTMITLRRRFAVAVAGVATAAATTVTL1APAPANAA.DVYGAIAYSGNGSWGRSWDYPTRA 
AAEATAVKSCGYSDCKVLTSFTACGAVAANDRAYQGGVGPTIAAAMKDALTKLGGGYIDT 

WACN* 



rTAGAGRPRDRCARIVCTVFIETAWATMFVALLGLSTISSKADDIDWDAIAQC 



MTSP2 5 

ESGGNWAANTGNGLTCGLQISQATWDSNGGVGS 
CSSCSQGDAPLGSLTHILTFLAAETGGCSGSRDD* 



SPIMglvflavlvi fai I waksvali PQAEAAVIERLGRYSRTVSGQLTLLVPFIDR 
V^ARVDLRERVVSFPPQPVI TEDNLTLNIDTWYFQVTVPQAAVYEI SNYI VGVEQLTTT 

AEAIARKPVEGSLGTPPRLTQ* 



tSIIMrfaaafaavliawclpantaaaddklplgggagivvngdtmctltti 

i?^S^^GPGAOIAAEGAENAGPVGIMVAGNDGLDYAVIKFDPAKVTPVAVFNGFA 
?^?GPDP^FGQ?ACKQG^TGisCGVTWGPGESPGTLVMQV 



EFDGPAGSVPDPS^QVSNHRTPlKNFVWt £ pAvmLS ncdpqrsGEIDLIEWYGN 

g^d^pSpfSpgykvfpvlniavggsgggdpatgsypqemlvdwvrvf. 
yrdgevqt i rklngmp sqd * 



|gg2ap™qqtfstsnddaakfdpdfvkadgktcrfnpwpypip* 
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FIG. 1 (continued) 

MTSP3 1 

MRFYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKI 
KS I SKOFEKTAEDPRFRFFGNWVGEHVQPGELSERYDAVI YAVGAQSDRMLNI PGEDLP 
GS TAAVDFVGWYNAHPHFEQVSPDLSGARAWIGNGNVALDVARI LLTDPDVIARTDI AD 
HALESLRPRGIQEWIVGRRGPLQAAFTTLELRELADLDGVDWIDPAELDGITDEDAAA 

VTKVCKON I KVLRGYADRE PRPGHRRMVFRFLTS P I E I KGKRKVER I VLGRNELVSDGS G 
RVAAKDTGEREELPAQLWRSVGYRGVPTPGLPFDDQSGTIPNVGGRINGSPNEYWGWI 
KRGPTGVIGTNKKDAQDTVDTLIKNLGNAKEGAECKSFPEDHADQVADWLAARQPKLVTS 
AHWQVIDAFERAAGEPHGRPRVKLASIAELLRI GLG * 

MT C !P3 2 

vtnpIp^tvdvvvvgagfaglaaareltrqghevlvfegrdrvggrsltgrvagvpadmgg 

QPTCPTODAVLALATELGIPTTPTHRDGRNVIQWRGSARSYRGTIPKLSLTGLIDIGRLR 
WOFERIARGVPVAAPWDARRARELDDVSLGEWLRLVRATSSSRNLMAIMTRVTWGCEPDD 
vqMLHAARYVRAAGGLDRLLDVKNGAQQDRVPGGTQQIAQAAAAQLGARVLLNAAVRRID 

rIgagvivtsdqgqaeagfvivaippahrvaiefdpplppeyqqlahhwpqgrlskayaa 
yStpfwrasgysgqalsdeapvfitfdvsphadgpgilmgfvdargfdslpieerrrdal 
rctaslfgdealdpldyvdyrwgteefapggptaavppgswtkyghwlrepvgpihwast 

etadewtgyfdgavrsgqraaaevaall* 

mkgtkiavvvgmtvaavslaapaqaddydapfnntihrfgiygpqdy^awlakiscerls 
rgwg^yksatflqrnlprgttqgqafqflgaaidhycpehvgvlqragtr* 



SJItMvsavawallgvssaqadpeadpgageanyggppssprlvdhtewaqwgslpsl 

r^Iqvgrta^rlgmaaadaawaev^ 
nlepwrpwddsemlasgcnpgspeesf* 

SSIMkpttsnvsvakiaftgavlggggiamaaqataatdgewdqvarcesggnwsint 

rNGYLGGLQFTQSTWAAHGGGEFAPSAQIASREQQIAVGERVLATQGRGAWPVCGRGLSN 

a^eI^pasaa^ 



PP^PAELAPPAPADLAPPAAVNEQTAPGDQPATAPGGPVGIjATDLELPEPDPQPADAPPP 

gdvteapaetpqvsniaytkklwqairaqdvcgndaldslaqpyvig* 



m^THRKKAMIjALAAASIAATI^PNAVAAAEPSWNGQYLVTLSANAKTGTSMAANRPEYPH 

fSapaksitaytpgqygiltgvfhtdiasgtckgnvdmpvsakpivg* 
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FIG. 1 (continued) 

MTSP37 

MRYLIATAVLVAWLVGWPAAGAPPSCAGLGGTVQAGQICHVHASGPKYMLDMTFPVDYP 
DOQALTDYITQNRDGFVNVAQGSPLRDQPYQMDATSEQHSSGQPPQATRSWLKFFQDLG 
GAHPSTWYKAFNYNLATSQPITFDTLFVPGTTPLDSIYPIVQRELARQTGFGAAILPSTG 
LDPAHYQNFAITDDSLIFYFAQGELLPSFVGACQAQVPRSAIPPLAI* 

MTSP3 8 

LKNART TLIAAAIAGTLVTTSPAGIANADDAGLDPNAAAGPDAVGFDPNLPPAPDAAPVD 
TPPAPEDAGFDPNLPPPIA.PDFLSPPAEEAPPVP^^YSVNWDAIAQCESGGNWSINTGNG 

YY 



GGLRFTAGTWRANGGSGS AANAS REEQ I RVAENVLRSQG I RAWPVCGRRG^ 



MTSP3 9 _ _ 

^^StIfdirslrlpk^sakvvwgglvwlawaaaagarlyrk^ 

PGDKVOIMGVRVGS IDKI EPAGDKMRVTLHYSNKYQVPATATAS I LNPSLVASRTI QLS P 
PYTGGPVLODGAVIPIERTQVPVEWDQLRDSINGILRQLGPTERQPKGPFGDLIESAADN 

lagkgrolnetlnslsqaltalnegrgdfvaitrslalfvsalyqndqqfvalnenlaef 

^WFTKSDHDIADTVERIDDVLGTVRKFVSDNRSVLAADVNNLADATTTLVQPEPRDGLE 
TALHVLPTYASNFNNLYYPLHSSLVGQFVFPNFANPIQLICSAIQAGSRLGYQESAELCA 

oySpvLdalkfnylpfgsnpfssaatlpkevayseerlrpppgykdttvpgifsrdtpf 

SHGNHEPGWWAPGMQGMQVQPFTANMLTPESLAELLGGPDIAPPPPGTNLPGPPNAYDE 

snplpppwypqpaslpaagatgqpgpgq* 

j£|Bjj^SGSFAIGLA*^ 

AGGGISGPTRGTGTGINTVGFDASGLIQYAYAGAGLKLPRSSGQMYKVGQKVLPQQARKG 

Sl^fygpegtqsvalylgkgqmlevgdwqvspwtngmtpylwvlgtqptpvqqap^^ 

PAP VQQAPVQQAPVQQAP VQQAPVQQAPVQQ APVQQAP VQP P P FGTARS R * 

Sftrrfaasmvcttltaatlglaalgfagtas as stdeaflaqlqadgi tpp saarai kd 
ahavcdaldeghsakavikavakatglsakgaktfavdaasaycpqyvtss* 

Saamwrrrplssallsfglllgglplaappiagateepgagqtpgapwapqqswnscre 
f^tseirtarcatvsvpvdydqpggtqakiavirvpatgqrfgallvnpggpgasavd 
^a^paiadtdilrhfdlvgfdprgvghstpalrcrtdaefdayrrdpmadyspagvt 

mSoWRO^DCVDRMGFSFlANIGTASVARDMDMWQALGDDQINYI^YSYGTELGTA 
YLERFGTHVRAMVLDGAIDPAVSPIEESISQMAGFQTAFNDYAADCARSPACPLGTDSAQ 

^rySalvdplvqkpgktsdprglsyadattgtinalyspqrwkyltsgllglqrgsda 
gdllvladdydgrdadghysndqdafnavrcvdaptpadpaawvaadqrirqvapflsyg 

QFTGSAPRDLCALWPVPATSTPHPAAPAGAGKVVVySTT 
ITFO 



,GTQHTAVFDGNQCVDSAVMHYFLDGTLPPTSLRCAP* 



6/26 



WO 00/66143 ~ ~ PCT/US00/12197 

FTG. 1 (continued) 

MKTGTATTRRRLLAVL I ALA.L P GAAVALLAE P S ATG AS DP C AA S E VARTVG S VAKSMGD Y 

ldsh?eSqvmtavlqqqvgpgsvaslkahfeanpkvasdlhalsqpltdlstrcslpis 
glqai glmqavqgarr* 

SMeMilragaaflvlgiaaatfpqsaaadstedfpiprrmiattcdaeqylaavrdts 

PWYlRYM^F^^LQQATINKAHWFFSLSPAERRDYSEHFYNGDPLTFAVAmHMKIF 
FNNKGWAKGTEVCNGYPAGDMSVWNWA* 

^^TPMTSMGDLLGPEPILLPGDSDAEAELLANESPSIVAAAHPSASVAWAVLAEGA 
LADDKIVTAYAYARTGYHRGLDQLRRHGWKGFGPVPYSHQPNRGFLRCVAAIjAR^ 

ETDE YGRCLDLLDDCDPAAR PALGL * 
E I HS SDTDFARFADLKWTDPLRE * 



YQMP YQP VQ S P TQVEATRQGKS YTLTGTGHAV I 
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FIG. 2 



mt spl 

a 



?Sitcqcatcgtgcagttcggagtttccgccgtggccgcggcggcgat 
.aaSatcIgSgccgggtcggggatcgcggcggcgttcgacggcgaggacg 
ioa?Sac?lgccccgacgccgaccgcgcgcgcgccgccgcggtgcaggcg 
o??ccSSg?igcaclgcSggagaagtcgagaccgagaccggcgaaggcgc 
cQccgc??S?ggcgtlctggtcacccggcccgacggcacccgtgtcgagg 
tccacc?ggal?gggatttccgggttctggacaccgaaccggccgacggg 

gacggcggttag 

SSSlqctgtcgttgaccgcattgagcgccggtgtaggcgccgtggcaat 
^S??S;?cqtcglggccggggtcgcctccgcagatcccgtggacgcgg 
?ca?ta2?aclac??paa?!acgggcaggtagtagctgcgctcaacgcg 
^Saa?2cQqgggctgccgcacagttcaacgcctcaccggtggcgcagtc 

ccocqclaltgcaagctgtgccgggggcggcacagtacatcggccttgtc 
gagtcggttgccggctcctgcaacaactattaa 



SSScaccggcatcgctagccatgccggcgccctgggtgccgccttagt 
?^?5StcqQcqccgcaattctgcacgacggcccagcagcggccgacc 
Saicclagaig^cgltttctggigctgctcgagaaaaaggaaatcccc 
caaacca^y^ ^ ^ cacaaagtgt gtcg 
9CC9 ^?al?qqcq|catpcggtgaa?gaL?!gtggacgggttacgca 
caaactcgatggcggcacg gg y a tctaccctgtcC gcctc 

ScaacqaccatgacSc^ 

acgacgaccacga^ y taqcqttcqccatggccaattt cgagccgggat 
gaaccatcacagcaagatggcgttcg ^ gcgcggt caac 

^ gaa ^;acqacc?gSggic|tcig?itIggacatgacca t catgtcgcc 
tC99 ?^aqqaSccg?cgg|tgcla?gcttgcctcggtgctcggagcgg 
99gat99 SatScctqataccgaatccgccgccgattccggtaccg 
ttC9 ^S^ocqSagaccc?ga?tc?acccc!gcigatcgtggcaccg^ 
CC9C S?lSoaclaq?qccgclgcaacagccgccgcccccgccgccagagg 

oqcqIcgpag?gg?Igtggcggcggtggtgacggaccggtagagccgtc 
g?c?gcl?gacc?atgccgccgggctttatcaggc4:cgcgccgtga 



™ ts P 4 ^^t-aataccqqqttgcacgctcgtcgggctgatgctgacgtt 

cgacggcccgacggg a acgccaccagc 

agatccgttcgacgaaccggttggc ^ t atcaccacC gt 

Ca9 ^?qqcqagggtc^ 

Stqtgtacacacatgcaggtggtctacccgggggtcaacctcacctcgcc 
Sagcacctgcgcgcaagccaacttttcctag 



i 
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FIG. 2 (continued) 



itigttttaagaagtaggaaaagcacgctcggcgttgtcgtgtgcttagc 
actqgtgctcggtgggccgctcaacggttgcagcagcagcgcgagccacc 
acqgtccactgaacgcaatgggaagtccggccataccgtcgacggcgcag 
qaaatacccaacccgttgcgcggtcagtacgaagacctcatggaaccgct 




catqatcgacaacgcgttgaccaggctcgccgaccgcggcatgcggctga 

cactQCQggtgtacgcctacagctcgtgctgcaaggcttcctatccggac 

qacactaacatcgcgattcccgactgggagcgcgctatcgccagcaccaa 

ciccagttatccagggccggcgaccgatccctcgaccggggtggtgcagg 

tqqtgccgaatttcaacgattcgacctatcttaacgattttgcgcagttg 

ctcgccgcgcttggtcgccgctacgacggtgacgagcgcctcagcgtgtt 

cqagttctccgggtacggggacttcagcgaaaatcacgtcgcatacctgc 

glgacacgctcggtgcgccgggtccgggcccggatgaaagcgtggcgacc 

ctqqqctattacagccagttccgtgatcagaacatcaccaccgcgtccat 

caaacagctaatcgcggcgaacgtcagcgccttcccgcatacccaactgg 

taaccagtcccgcEaatccggaaatcgtgcgagaactgttcgccgacgag 

glcaccaacaagcttgccgcgccggtgggtgtccgctcggattgcctggg 

catcqacqcgccgttgccggcctgggccgagtccagcacttcgcactatg 

tlcagSclaSagacccggSlgtcgccgcgctgcggcagcggctggcaacg 

qlqccggtgatcaccgagtggtgcgagttgccgaccggcagttcgccgcg 

qqct?SItlcgagaagggcctgcgcgacgtcatcaggtatcacgtgtcga 

?qacg?cgagcg£taiSttccccgaccagacggcgacctcgccgatggac 

c?Sq?gt?g?aIctggtgtgggcgcaagctaacgccgccgcaggctatcg 

qtSc??gg?cgaagcicag2Sggggtcgcaagcgctagcgggcaaggtcg 

IgJcgalltclgtlalctggaccaactacggcgctgctgccgccaccgaa 

S15?iggtgccSggctaccggctggtggattccaccggacaggtggttcg 

qS?qc?Iccggclgcggtggacctgaagacgctggtctccgaccagcgcg 

qcStclcapagcgaScagccgacaccggcgtcggtcgccgagacggtt 




gfcgcSa?gc?cgIcatcccacgSgacgcicagaccgcggtcaacgcttc 
gtag 



flSfeccgactcctagctttgctgtgcgctgcggtatgcacgggctgcgt 
?qc?ltggttctcgciccag?gagcctggccgtcgtcaacccgtggttcg 
cqSaltlggtcgglaatgcSactcaggtggtttcggtggtgggaaccggc 
gS?cga?igc?Iagatigatgtctaccaacgcaccgccgccggctggca 
Qlcqc?caagaccggta?caccacccatatcggttcggcgggcatggcgc 
^ggfSgccalgagSlgatatccggccactccgatgggggtttacagcctg 
gactclgctt£tggcaccgcgccgaatcccggtggcgggttgccgtatac 
ccaaqtcggacccaatcactggtggagtggcgacgacaatagccccacct 
t?aactccltgcaggtctgt2agaagtcccagtgcccgttcagcacggcc 

gaSagcgagaLcllcaaatc^ 

cq?clacalggccaaggtcccaggcaaaggctccgcgttcttctttcaca 
ccaccgacgi?gggc?caccgcgggttgtgtggcgatcgacgatgccacg 



9/26 



WO 00/66143 ^ PCT/USOO/12197 



FIG. 2 (continued) 

ctggtgcagatcatccgttggctgcggcctggtgcggtgatcgcgatcgc 
caagtaa 



rntsp7 

atgattcgcgaactggtcaccaccgctgcgatcacgggtgccgcgatcgg 
tggggcgccagtcgcgggcgcagacccgcagcgttatgacggcgatgtgc 
cggggatgaactatgacgcttcgctgggcgccccatgctccagctgggag 
cgcttcatttttggacgaggcccctccggtcaggccgaagcctgtcattt 
tccgcctcctaaccagttcccgccggccgaaaccggctactgggtgatct 
cctacccgctatacggcgtccagcaggtcggtgcgccgtgtccgaagccg 
caqgcggccgcgcagtctccggatgggttgccgatgctgtgtctgggagc 
ccgtggatggcagccgggatggtttaccggggccgggttcttccctccgg 

agccataa 



aTq^itgaattacggttggtgggcggtgtgctccgggtccttgtcgtggt 
cqqtgcggtgttcgatgtggcggtgctaaacgccggtgcggctagtgccg 
acigcccggtccagctgaagagccgattgggcgatgtttgcctggacgcc 
ccgagtgggagctggttcagcccgctggtgatcaacccctgcaatgggac 
caactttcagcgctggaatctcaccgatgaccggcaggtcgagagcgtgg 
ccttccccggggaatgcgtgaatatcggaaatgctttgtgggcgcgcctg 
caqccctgtgtgaactggatcagccagcactggactgtccagcccgacgg 
cctggtcaagagtgatcttgatgcctgcctcacggttctcggcggtccgg 
atcctgggacctgggtgtccacccgctggtgcgaccccaatgcacccgac 

caacagtgggatagcgtgccgtaa 

atSSlggccatgaccgcccgttcggtggtactcagcgtgctgctcggtgc 
tcatcccgcgtgggccaccgcaagcgaattgatccagctgacagcggatt 
tcqgtatcaaggagacgacgttgcgggtcgcgctgacccgcatggtcggt 
accgqqgatctggtccggtccgcggacggctaccggctctcggatcggtt 
qctqqcccgccagcgccgacaagatgaggccatgcgcccacggacccgcg 
cttggcacggaaactggcacatgctgattgtcaccagcatcggcaccgat 
qctcqtacccgggccgcactgcgaacctgcatgcaccacaagcgtttcgg 
Egaattgcgggaaggggtgtggatgcggccggacaatctcgacctcgact 
tggagtccgacgttgcggcccgggttaggatgctgacggcccgcgacgag 
gllcccgccgacttggccgggcagctgtgggatctgtcggggtggaccga 




atqttqcccgctgaactgttgcccgccgactggccgggcgccgggttacg 
ggcggcgtaccacgacttcgccactgcaatggcgaaacgacgcgatgcaa 

ctcaactcctggaggtgacatga 



SfScigccggcgtcggtaacgcatccggtagcgttttagatatgacgtc 
cgtgcicacagtgccaagcgccgtcgcgctggtgacgtttgccggagccg 
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FTG. 2 (continued) 

cgctcagcggggtcatcccggcgattgcccgcgcggatccggtcgggcat 
caaatgacctacaccgtcacgaccaccagcgacctgatggccaacattcg 
S?Sca?gagcgccgatccgcccagcatggcggctttcaatgccgattcat 
coaagticltgattaccttgcacactccgatcgctggcggtcagccgctg 
glc??taccgicacgctggcaaacccgagccagtgggcgatcgtcaccgc 
caacggcggcctgcgggtcaatccggagttccactgcgagattgttgtag 
acgg??^gt:gg?gitgtcgcaggacggcggcagcggcg t gcagtgctcg 

actcgtccctggtaa 

SSSciaccagcaaaatcgccaccgccttcaagaccgccaccttcgcgct 

Salcglcggtgccgttgcactgggattggccagccccgccgacgcagcgg 

^ggcaclitltafggigacccggcagccgccgccaagtactgg^ 

raaacatacqacgactgcgtcctgatgtcggccgcggacgtgatcggtca 

aalgaScgglagggagccEtccgagcgcgccatcatcaaagtggcccagt 

caaSccSgcgtcgtgcaccccgggtccatctacacaaagccggccgac 

olSSaqcaccclaaltigggaatgggtaccagcgtggccgacataccgac 

IctlSggcgclStacglcltcglcgccgttatcaccgacgagga^ 

rSraaccaccqgagtcgccaccggcatggccgccctcgagcagtatctg 

SoiSacQgScallSlgtlatcgtcagcatcaacgccgagatgatctgggg 

ccagcc?g?cgagga^ 

ccSlggtlgtiallggtgtcgataccgaaaacggcattgttcacctcaac 
g ^?Sa?acccccac5ggccgcgacgagcagatcccgatggaaacctt 
?g?cgaI|cg?gggccScSgc?acga? t tcatggccgtcaccacc t ga 




ac?a??tcl?;t?gS?gglcg|cacggcctcggacttctgcagtgcgccc 
aacgcgaa?tacciaaccaccgcgagctcatggccgggctag 



acc 
cag 
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FIG. 2 (continued) 

aactacaaaatcagctcggcatcctcaacggcttactcggaccgactgga 
gggcgagtgcctcagccctaa 

itqatcacaaacctccgacgccgaaccgcgatggcagccgccggcctagg 
ggctgctctcgggctgggcatcctgctggttccgacggtggacgcccatc 





accaggtgcggcgcggtcgcctacaacggctcgaaataccaaggcggaac 
cggactcacgcgccgcgcggcagaagacgacgccgtgaaccgactcgaag 
gcgggcggatcgtcaactgggcgtgcaactaa 



SSScggtgctgctcgacgccaacgtgctgatcgcattggtggtcgccga 




cqctcgggacagtccgcggcggcggctcgggatgtcgtcagtgcggtcca 
g?gca?|SgccIccaIgaat t ctggcccgatgcactctctttcgccggtg 
Icqaqqtcqctggtgtggttgggcaccggcaggtgaccgatgcctacctt 
gccSllctcgcIIglalccacgacgggcagttggcgacgctcgacagcgg 
?tSg?acalc?gSacggcgacgtcgcggtactcattccaacgaccacct 



c 
ga 



ffiSSiicgccaatcattgatgccccagcagacccttgccgccggcgtttt 
Iglqgltgcgctgctatgcggtgtcgtgacggcggcggtgccaccacacg 
clcacqclglcgtggtcgcctatctggtcaacgtgacggtacgcccgggc 
tacla?t?lgccaI?gcSgacgccgcgttgagttacggacatggcct:ctg 
caaaSSgg?Itctcgiggccgcccttacgcacagatcatcgccgacgtca 
S q g??g??J?cgaclIclgclaccaataccaggcct:cgtatctgc t cagc 
clggc?gtcaalgaactctgccccgcgctgatctggcagttgcgaaactc 

cgcagtcgacaatcggcgctcgggctga 

iJSgitcgtatctattgcgtatcgagctggccgaccggccgggcagcct 

!gig!cgc?ggcggtcgcgctcggctcggtgggcgccgacatcctctcgc 

tclacgtggtcgagcgcggcaacggctatgcgatcgacgacctggtggtc 

galS?ic?Iccggiaiciatgcccgacacgctga t cactgctgccgaggc 

Ictgalcggcgtccgggtagacagcgtccgcccgcacaccggcctgttgg 

laglccalcgcgagctggaactgctcgatcatgtggccgcggctgagggc 

gclaccgcacggc?ccl|gttctggtcaacgaggccccccgggtgc^ 

gglgagctggtgcacggtgttgcgcagttccggcggggagctgcaccgtc 

fgglcggcllclcaggtgcgccggagacccgggccaattcggcgccctgg 

ctqccqatcgagcgggccgcggcgctggacggcggcgccgactgggtgcc 

acacgcacaccgcggtggtgctgggcaggccaggcccggaatttcgcccg 
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FIG . 2 (continued) 

tcggaggtggcgcggttgggttatctagccggcatcgtggcgacgatgct 
gcgctga 

m tspl 8 

itic^tgacggggagcagagccagccaccggcccaagaagatgcggaaga 
cQactcgcggcccgacgccgcggaggccgccgcggccgaacccaaatcat 
caqccqgtccgatgttctcgacctacggtatcgcctcgacactactcggc 
qtgctatcggtcgccgcggtcgtgctgggtgcgatgatctggtccgcaca 
ccgcgatgactccggcgagcgtacctacctgacccgggtcatgctgaccg 
ccqctgaatggacggccgtgctgatcaacatgaacgccgacaacatcgat 
qccagcctgcagcgactgcacgacggaacggtcggtcaactcaacaccga 
cttcgacgctgtcgtgcagccctaccggcaggtggtggagaagttgcgga 



aScSclgtttgccaStcgSaccgactcggtgctgctggtcgcgacgtcgg 




gattcgatga 



aTiSaiatggtgaaatcgatcgccgcaggtctgaccgccgcggctgcaat 
SpacqccgcEgcggccggtgtgacttcgatcatggctggcggcccggtcg 
tl?SicalatlcIlccggtcgtcttcggcgcgccactgccgttggacccg 
oSaSccgcccltgacgEcccgaccgccgcccagttgaccagcctgctcaa 



calSQcfltcgtciccggtcacgcagaacgtcacgttcgtgaatcaaggc 
gc?igatgc?gtlacilgcatcggcgatggagttgctgcaggccgcagg 

gaactga 



KiSiSctacggcgccatcagaccctgacgctgcgactgctggcggcatc 
cgcgggcattScIgcgccgcggccttcgccgcgccagcacaggcaaacc 
ccgl?|acgacgcgttcatcgccgcgctgaacaatgccggcgtcaactac 
aacqatccggtcgacgccaaagcgctgggtcagtccgtctgcccgatcct 
glc?gagc?cgg?ggitcgtttaacaccgcggtag<:cagcgt t gtggcgc 
gcgc?SSaggIatg?2ccaggacatggcgcaaaccttcaccagtatcgcg 
????cgatilactgcccctcggtgatggcagacgtcgccagcggcaacct 
gccggccctgccagacatgccggggctgcccgggtcctag 



SiSSaiattQtgtcaacgctactcagcattccgttgatgatcggcttggc 
aptSlSgclclcgcggigcccagcggtgacgacgcggtctttcttgcct 
?actaqlqcgggcaggcattacctacagccacccggatcaagccatagca 
??gggSIggSgtl?gcgcg t tagtcgaaagcggcgaatcgggtcttca 
aatcqtcaacgagctgcggacccgcaatcccgggttttcgatggacggtt 
Ptglaag^igltglgllctccgcgcatgtctattgcccccaccagatc 

actaaaaccagcgtcagcgcgaaatag 
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FIG. 2 (continued) 



StSqlicgcacgcttgcgttgcgcgcatcggcgggactcgtcgcgggtat 
aacaatggccgcgatcacgctcgcacctggggcccgcgccgaaaccggtg 
Igcaattccccggggatggggtgtttctcgtgggaactgacattgcgcca 
ggcacctaccgcacggaggggccgtcgaatccccttattttggtgttcgg 
tgtccga gctctcaacctgctcatggtcgacacacagcgcacccg 
IglfgaicaaEgagaacattgtcgacaccaacacctctatgggcGcgatg 
tcagtggtgatcccgccgaccgtggcagccttccagacgcataactgcaa 

gctttggatgcggatctcatag 

StSStcgccgttatcgcctcgcattatcgcagcgttcaccactgcagt 
Sagcgccgccgccatcggacttgccgtcgccaccgccggcaccgccggcg 
c?IacSccaalgacgaagccttcattgctcagatggagtccattggcgtc 
SccJtctcStcacclcaggtggccacccagcaagcccagctggtctgcaa 
aaaqctqgccagcggcgaaaccggcaccgagatcgccgaggaggtcctca 
icclaaccaacctllccactaagcaggcagcctacttcgtcgtcgacgca 
-accaaggcctactgcccgcaatacgccagccagctcacctag 

ffiifiiacgatgattactcttcggcgacggttcgcggtggccgtcgccgg 
SatSQccactgccgccgcgacgaccgtcaccctggctcccgcaccagcaa 
S?qccqccga?gtcta!gicgcaattgcctactccggcaacggctcgtgg 
SQ?cgl?citglgactacccaacccgggcggctgccgaagccaccgccgt 
Ilag?cg?ItlPtactccgactgcaaggtgctcaccagtttcaccgcct 
oSqgcgccitclccgccaacgatagggcataccagggaggagttggaccc 
IcStggcIgccgccatgaaggacgccctgaccaagctcggcggcggcta 

catcgacacctgggcctgcaactaa 

lEqfciccgggtttgcttactactgcgggtgctggccgaccacgtgacag 
atSqccalgatcgEatgcacggtgttcatcgaaaccgccgttgtcgcga 
?^Sta??tq?cgcittgttgggtctgtccaccatcagctcgaaagccgac 
gaca?cga?tglgScgcca?clcgclatgcgaatccggcggcaattgggc 




cSacSqatcgaggtcgcagacaacattatgaaaacccaaggcccgggtgc 
aSqqc?qaalti?ag?tc£tgtagtcagggagacgcaccgctgggctcgc 
tclIccacKcctgacgttcctcgcggccgagactggaggttgttcgggg 

agcagggacgattga 

SScffggagccgttgctggtctggtgtttctggccgtcctggtgatttt 

Iq?ca?cl?cgtigtggccaagtcggtggcgctgatcccgcaggcggagg 

ccgcgg^gatcglicggctgggtcgctatagtcgtacggtcagtgggc^ 

^qScqctgttggHgccgttcatcgaccgcgtccgggctcgggtggacct 

acqcqlqcggglggtgtcgtttccgccgcaaccggtgatcaccgaggaca 

!c??lacgc?galcalcglcaccgtcgtctacttccaggtgaccgttccg 
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FIG. 2 (continued) 

caggcggcggtgtacgagatcagcaattacatcgtcggggtcgaacagct 

caccaccaccaccctgcgcaacgttgtcggcgggatgacgctggagcaga 

cgttgacctcgcgtgaccagatcaacgcccagctgcgcggcgttctcgat 

gaggcgaccggccgctggggtctgcgggtggcgcgggtggagctgcgcag 

catcgatccgccgccgtcgattcaggcgtcgatggaaaagcagatgaagg 

ccgaccgggagaagcgagcgatgattctgaccgccgaaggtacccgggag 

qcggcgataaaacaggccgaggggcaaaagcaggcgcagatcctggccgc 

cgagggcgccaagcaggccgcgatcttggctgctgaggccgatcggcagt 

ctcggatgctgcgcgctcagggtgagcgcgccgcggcctacctgcaggcg 

caagggcaggccaaggccatcgagaagacgttcgccgcgatcaaggctgg 

ccggcccaccccggagatgctggcctaccaatacctgcagacgctgccgg 

aqatggcgcgtggggacgccaacaaggtatgggtggtgcccagcgacttc 

aacgccgcactgcaggggttcaccaggctgctgggcaagccgggtgagga 

cqqqqtqttccggttcgagccgtccccggtcgaagaccagcccaagcacg 

caqccqacggtgacgacgccgaggtcgccggctggttctccaccgatacc 

qacccgtcgatcgctcgggcggtggctacagccgaggcgatagcccgcaa 

gccggtcgagggttcgctggggacgccccccaggttgactcaatag 

EtSaqacggcgcacaggcgctttgccgcggcattcgcggccgtgctttt 

aaccqttqtgtgcctacctgcgaacaccgcggcagccgacgacaagctac 

cactqqqcqgtggtgcgggcatcgtcgtcaacggggacaccatgtgcacc 

c?aa?IacSt?ig?cIliacaagaacggtgacctcatcggcttcacttc 

cqcccactgtgggggcccgggcgcgcagatcgccgctgagggtgccgaga 

alqcgggcccggtaggcatcatggtcgccggcaacgacggcctggactac 

qcqqtgatcaagttcgacccggccaaggtgaccccggtggccgtcttcaa 

cgggtttgcgatcaacggcattggcccggacccgtcgttcggccagatcg 

cctqcaaqcagggccgcaccaccggtaactcgtgcggggttacctggggg 

ccaqgggagagtccgggcacccttgtgatgcaggtctgcggcggaccggg 

caactclqQtgcgccggtgaccgtcgacaatctgctggtcgggatgatcc 

Scggcgcl?t?agcgI2aItc t gccgagttgcatcaccaaatacatcccg 

ctqcacaccccggcggtggtgatgtcgatcaacgccgacctggccgacat 

ca?cgccaagaa5cggccgggcgcgggattcgtcccggta'ccggcctga 



iEiS^Eatgcctgagatggatcgtcgccgaatgatgatgatggcggggtt 
cqqcgccctggctgccgcgcttcccgccccgacagcctgggccgacccgt 
cScqqccqgccgcgccggctggtccgacaccggcgcccgccgcgccggct 

•gcglLac^gggcttttgttc^ 




gacagtcgacaiaacgtgttcctcgacggcaactccaatctcgtgctgcg 
cqctacccgagagggcaacaggtatttcggtggcctggtccacggcctgt 
glcgggg?lgla??lggaccacctgggaggcccggatcaagttcaact.gc 
l?gllilciggcatgtggcccgcctggtggttgtccaatgacgatcctgg 
?cl?Sgcgiliaaa?cgacctgatcgagtggtatggcaacgggacttggc 
cqtcgggaaccaccgtgcacgccaacccggacggcaccgcattcgagacc 
tgcccgatcggtgtggacggtggttggcacaactggcgcgtcacgtggaa 
tScgagcggcatgtacttctggctggattacgccgacggcattgagccct 
actlclcggttccggcgaccggaatcgaagacctcaacgagcccatccgc 
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FIG. 2 (continued) 



qaatggccgttcaacgaccccggctacaaggtgtttccggtgttgaacct 
tgcggttggcggttctggtggcggcgatcccgcgacgggttcctatccac 
aggagatgctcgtcgactgggtgcgcgtcttttaa 



gl|cafcgtcgaacggccctgaagctcccgctgctgctggcggcaggcac 




acaaactacatcacctcgaacgccatcaaccagctcgagatgttccagcc 
aaacacatacgatccccggcgcatcgacaacgagctgggccttgcgcggt 
ttcacQqgttcaacaccgtgcgagtcttcctccacgacctgctgtgggcc 
caagacgcgcccggtttccaaacccggctcgcgcagttcgtcgccatcgc 



aacScoatlccacatcaaaccgctctttgtcctgttcgactcctgctggg 
ISccqctccccagaccgggtcggcagcgggcgccaagggctggggtgcac 
aactccqggtgggtgcaaagtccgggtgctgaacgcctcgatgaccgccg 
c?Stqc?agcISgc?gtacaactacgtcacgggtgtgttgggccaattcc 
gcaalgacgatcgcgtgttgggttgggacctgtggaatgaacccgacaat 
Sccgcgcgcgtgtatcgcaaggtggaaaggaaagacaagctegagcgcgt 
cgcggagitSctcccccaagtgttccgatgggcccgcacggtcgatccgg 
ttcaaccgctgaccagtggtgtctggcaagggaattggggagatcccgga 



oTqfc^acgtacggctggcgcgcctacgccctgccggttctgatggtgct 
aaccacqqfggtlgtgtacSagacggtgaccgggacgagcacgccaaggc 
?cqcggllgc?clia?cgtccgggactcgccggccattggtgtggtgggg 
Jclqcgalcctcglcgcaccgcctcgcggtcttgcagtgttcgatgccaa 
?S?a C ?qqccgggacgctgccggatggcggcccgtt:caccgaggctggtg 

acSqtcaaagtg?tcaggtataccgtcgagatcgagaacggtcttgatcc 
caclatqtacggcggtgacaacgcattcgcccagatggtcgaccagacgt 
tSacSaltSccLiigctggacccacaatccgcaattcgcgttcgtgcgg 
a?SqacagcggaaSlIcclIcttccggatttcgctggtgt:cgccgacgac 
aq?qcqcgggiggtgtggctacgaattccggctcgagacgtcctgctaca 
alctqfcgftlllcggcatggatcgccaatcgcgggtgttcatcaacgag 
qcqcqc?qggta?gcggagccgttccattcgaaggtgacgtaggttccta 

?cgg?aa?a?gtga t laa^ 

acilcgagccgtgcgaccaacaaggcggtctggctccggtaatgatgcag 
cagacg??tt?clcltccaatgacgacgcggccaagtttgaccccgactt 
cg??alggcggatggaaagacctgccgattcaatccctggccctacccga 

ttccctaa 
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FIG. 2 (continued) 

mtsp31 

atgcgtccctattacatcgccatcgtgggctccgggccgtcggcgttctt 

cgccgcggcatccttgctgaaggccgccgacacgaccgaggacctcgaca 

tggccgtcgacatgctggagatgttgccgactccctgggggctggtgcgc 

tccggggtcgcgccggatcaccccaagatcaagtcgatcagcaagcaatt 

cgaaaagacggccgaggacccccgcttccgcttcttcggcaatgtggtcg 

tcggcgaacacgtccagcccggcgagctctccgagcgctacgacgccgtg 

atctacgccgtcggcgcgcagtccgatcgcatgttgaacatccccggtga 

ggacctgccgggcagtatcgccgccgtcgatttcgtcggctggtacaacg 

cacatccacacttcgagcaggtatcacccgatctgtcgggcgcccgggcc 

gtagttatcggcaatggaaacgtcgcgctagacgtggcacggattctgct 

caccgatcccgacgtgttggcacgcaccgatatcgccgatcacgctttgg 

aatcgctacgcccacgcggtatccaggaggtggtgatcgtcgggcgccga 

ggtccgctgcaggccgcgttcaccacgttggagttgcgcgagctggccga 

cctcgacggggttgacgtggtgatcgatccggcggagctggacggcatta 

ccgacgaggacgcggccgcggtgggcaaggtctgcaagcagaacatcaag 

gtgctgcgtggctatgcggaccgcgaaccccgcccgggacaccgGcgcat 

ggtgttccggttcttgacctctccgatcgagatcaagggcaagcgcaaag 

tggagcggatcgtgctgggccgcaacgagctggtctccgacggcagcggg 

cgagtggcggccaaggacaccggcgagcgcgaggagctgccagctcagct 

qqtcgtgcggtcggtcggctaccgcggggtgcccacgcccgggctgccgt 

tcgacgaccagagcgggaccatccccaacgtcggcggccgaatcaacggc 

agccccaacgaatacgtcgtcgggtggatcaagcgcgggccgaccggggt 

qatcgggaccaacaagaaggacgcccaagacaccgtcgacaccttgatca 

agaatcttggcaacgccaaggagggcgccgagtgcaagagctttccggaa 

qatcatgccgaccaggtggccgactggctagcagcacgccagccgaagct 

qgtcacgtcggcccactggcaggtgatcgacgctttcgagcgggccgccg 

gcgagccgcacgggcgtccccgggtcaagttggccagcctggccgagctg 

ttgcggattgggctcggctga 

mtsp32 _ ^ ^ 

gtgacaaacccaccgtggactgtcgatgttgtcgtggtgggcgcgggctt 

cgccgggctggccgcggcccgcgagctgacgcgacagggtcacgaggtgc 

tqgtqttcgaaggccgcgatcgggtgggcggccgctcgttaaccggtcgc 

qtggcaggggtgcccgcggatatgggcggctcgttcatcggcccgaccca 

Igacgccgtgctggcgttggccaccgagctggggatcccgacaaccccga 

cccaccgcgacggccgaaacgtcatccagtggcggggatcggcacgcagc 

tatcgtggcaccatccccaagctgtcgctgaccgggctcatcgacatcgg 

ccggttgcgttggcaattcgagcgaattgcccgcggcgttccggtggccg 

ccccctgggatgcgcggcgcgcgcgtgaactcgacgacgtgtcgctcggg 




tqcacgccgcccgctacgtacgcgcggccggcggcctggaccggctgctc 
gacgtcaaaaatggtgcccagcaggaccgtgtgccgggggggacacagca 
qatcgcccaggcggccgccgcccaactcggcgcacgcgtcctgctcaacg 
ccgcggtgcgtcgcatcgaccggcacggagcgggtgtgacggtcacgtcc 
gatcagggtcaggccgaggccgggttcgtcatcgtcgccattccaccggc 
ccatcgcgtggccatcgagttcgatcccccgctgccgccggaatatcagc 
agctcgcccaccattggccgcagggccggctgagcaaggcctacgcggcc 
tattcgacgccgttctggcgggccagcgggtattccggccaggcgctgtc 
cgatgaggcgccggtgttcatcaccttcgacgtcagtccgcacgccgacg 
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FIG. 2 (continued) 




caqcgacgaagcgctcgacccccttgattatgttgactatcgttggggta 
cagaggaattcgcgccgggtggtccgaccgcggcggtaccgccggggtcg 
tqqacgaaatacggtcactggttacgtgagccggtcggtccgattcactg 
qqcgagcactgagaccgcggacgaatggaccgggtatttcgacggcgccg 
Icagatccggtcagcgtgccgccgccgaggtcgccgccctgctatga 

iEqaaqggaacaaagctggctgttgtcgtcggcatgacggtggctgccgt 
taqtttggcagcgccggcgcaggccgacgactacgacgcccccttcaaca 
acacqatccatcgcttcgggatctacggcccgcaggactacaacgcttgg 
cttqccaagat'cagctgcgaacggctgagcagaggcgttgacggcgatgc 
atacaaqtcggccactttcctgcaacgcaacctgccgcgcggaaccaccc 
agggccaagcgtttcagttcctgggcgccgcgatcgatcactactgccct 
gagcatgtgggcgtcctgcaacgggctggcacccgctaa 



^fegccctggtggccgtgtcggcggtggccgtcgtcgcactgctcgg 
?q?J?Stccgcccaagctgatcccgaggcggatcccggcgcaggtgagg 
ccalctatggHggccccccaagttccccacgtcttgtcgatcacaccgaa 
tqqqcqcagtggggaagtctgcccagcctccgggtctacccgtcccaagt 
tllicgtaLpitcclgccgcctcgggatggccgctgccgacgcggcct 
qqqScgaggttctcgcgctgtcaccggaggccgacactgccggcatgcgc 
qcqca|??catctgccactggcagtacgccgaaatcagacaacccggcaa 
IcccStggaacctcgagccgtggcggccggtcgtcgacgactcggaga 
tgttggct?ccggctgcaatccgggcagccctgaagagtcgttttag 



SiSofggacgccaccgtaagcccaccacatccaacgtcagcgtcgccaa 

qalcqclittlccggcgcagtactcggtggcggcggcatcgccatggccg 

Scaqgcgaccgcggccaccgacggggaatgggatcaggtggcccgctgc 

aaaSqqqcgglalctggtcgatcaacaccggcaacggttacctcggtgg 

ct?gclg??clctcaalgcaictgggccgcacatgg t ggcggcgagttcg 

ccclqtlggctcagctggccagccgggagcagcagattgccgtcggtgag 

cggg?gc?ggccaccclgggtcgcggcgcctggccggtgtgcggccgcgg 

qflltlgallgcaacaccccgcgaagtgcttcccgcttcggcagcgatgg 

Icqctccgttggacgcggccgcggtcaacggcgaaccagcaccgctggcc 

Sc?ccqcccgIIgacc2ggcgccacccgtggaacttgccgctaacgacct 

qcccqiSccgctiggtgaacScctcccggcagctcccgccgacccggcac 

ISccIgccglccliicSccacccgcgcccgccgacgtcgcgccacccgtg 

SJScttqccgtaaacgacctgcccgcaccgctgggtgaacccctcccggc 

Kctcclgccgaccclgcaccacccgccgacctggcaccacccgcgcccg 

S?Sacc?lgciccacicgcgcccgccgacctggcgccacccgcgcccgcc 

gS^ggllcIacccgtlgaacttgccgtaaacgacctgcccgcgccgct 

ggg^galcccctcccggcagctcccgccgaactggcgccacccgccgatc 

Ealcacccgcgtccgccgacctggcgccacccgcgcccgccgacctggcg 

cilcccgcgc?cgcigalctgg2gccacccgcgcccgccgacctggcacc 

Scccgc?g?ggtiaa?gagcaaaccgcgccgggcgatcagcccgccacag 
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FIG. 2 (continued) 



ctccaggcggcccggttggccttgccaccgatttggaactccccgagccc 
qacccccaaccagctgacgcaccgccgcccggcgacgtcaccgaggcgcc 
cqccgaaacgccccaagtctcgaacatcgcctatacgaagaagctgtggc 
aggcgattcgggcccaggacgtctgcggcaacgatgcgctggactcgctc 
gcacagccgtacgtcatcggctga 

^tqtccggacaccgcaagaaggcaatgctcgccttggcggctgcgtcgct 
qqcaqcgacgctggccccgaacgcagtcgcggccgcagaaccgtcgtgga 
acqggcagtacctcgtgacgttgtctgccaacgcgaaaaccggcaccagc 
atqqcggccaaccggccagagtatccacacaaagcgaactacacgttcag 
ctcacgctgcgcgtccgatgtctgcattgccaccgtggtcgacgctccgc 
caccaaaaaacgagttcatcccgcggccaatcgaatacacctggaatggg 
actcaatgggtacgggagatcagctggcaatgggactgcctgctacccga 
cqqcacaatcgaatatgccccagccaaatcgatcacggcctacacgcccg 
q?Saqtacggaatcctcaccggcgtctttcataccgatatcgccagcggc 
acgtgtaaaggcaatgtcgacatgccagtgtcggccaaaccgatcgttgg 

ctga 

SSlfttatctgatagcgaccgcagtgctcgttgctgtggtcctggtggg 
ctggScggcggctggtgcgccgccgtcatgcgccggcctgggcggcactg 
tqcaggcSggccagatctgccatgtgcacgcctcgggccctaagtacatg 
ctqqltatgacatttcctgtcgactatcccgaccagcaggcgctgaccga 
ctacatcacgcaaaaccgcgacgggttcgtcaacgtcgcgcaggggtccc 
cqctqcgagaccagccctaccaaatggacgccaccagcgaacagcacagc 
tlcglccaiccgcSgcaggccacccgcagcgtagtgctcaaattcttcca 
aaacctcggtggggcacatccgtccacctggtacaaggccttcaactaca 
IlScgclIcItlgcagcccatcaccttcgacacgttgttcgtgcccggc 
Iccacgccactggacagcatctaccccatcgttcagcgcgagc^ 




qcccagggtgagctgctgccgtcgtttgtcggcgcttgccaagcccaggt 
gccgcgcagcgccattccgccgctggcaatctaa 

Sffifaacgcccgtacgacgctcatcgccgccgcgattgccgggacgtt 
qqfgaccacgtcaccagccggtatcgccaatgccgacgacgcgggcttgg 
ISclaaacgccgcagccggcccggatgccgtgggctttgacccgaacctg 
ccgcSggclccggacgctgcacccgtcgatactccgccggctccggagga 
cqcqqqctttgatcccaacctccccccgccgctggccccggacttcctgt 
ccccacctqcggaggaagcgcctcccgtgcccgtggcctacagcgtgaac 
?ggglcgcla!igcgcagtgcgagtccggtggaaactggtcgatcaacac 
ciitaacggttactacggcggcctgcggttcaccgccggcacctggcgtg 
ccaacaqtqqctcggggtccgcggccaacgcgagccgggaggagcagatc 
cggg?glc?lagaa?l?gctgcg?tcgcagggtatccgcgcctggccggt 

ctgcggccgccgcggctga 



lESilaccatcttcgacatccgcagcctgcgactgccgaaactgtctgc 
aalggtagtggtcgtcggcgggttggtggtggtcttggcggtcgtggccg 
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FIG. 2 (continued) 

ctgcggccggcgcgcggctctaccggaaactgactaccactaccgtggtc 

gcgtatttctctgaggcgctcgcgctgtacccaggagacaaagtccagat 

catgggtgtgcgggtcggttctatcgacaagatcgagccggccggcgaca 

agatgcgagtcacgttgcactacagcaacaaataccaggtgccggccacg 

gctaccgcgtcgatcctcaaccccagcctggtggcctcgcgcaccatcca 

gctgtcaccgccgtacaccggcggcccggtcttgcaagacggcgcggtga 

tcccaatcgagcgcacccaggtgcccgtcgagtgggatcagttgcgcgat 

tccatcaatgggatcctccgccagctcggcccgacggagcggcagccgaa 

ggggccgttcggcgacctcatcgaatcggccgcggacaacctggccggGa 

agggcaggcagctcaacgaaacgctgaacagtttgtcgcaggcgttgacc 

gcgctgaacgagggccggggagacttcgttgcgatcacgcgaagcctggc 

gctatttgtcagcgcgctctaccagaatgatcaacagttcgttgcgctca 

acgaaaaccttgccgagttcaccgactggttcaccaaatccgaccatgac 

ttggccgacacggtggaacggatcgacgacgttctcggcaccgtccgaaa 

gttcgtgagcgacaacagatccgtgctggctgccgatgtcaacaacctcg 

ccgacgcgaccactacactagtgcaacccgagccgcgggacggtctggaa 

accgcgttgcacgtgttgccgacctacgccagcaacttcaacaaccttta 

ctatccactgcacagctctctggtgggccagttcgtgttccccaacttcg 

cgaacccaattcagctcatttgcagcgctattcaggccggcagccgactc 

qqctatcaggaatccgccgagctgtgcgcgeagtacttggcaccggttct 

qqacgctctcaagttcaattacttgccgttcggctcaaacccgttcagtt 

cqqcggccactttgcccaaggaggtggcttactccgaggagcggctccgc 

ccgccgcccgggtacaaggacaccactgtcccagggatcttctcgcggga 




qaatcgctggcagagctgctgggtggtccggatattgcccccccgccgcc 
qqqaaccaacttgcccggaccgccgaatgcgtatgacgagtccaatccgt 
tgccgccgccgtggtacccgcagcccgcgtccctcccggctgcgggcgcc 
acaggacagccaggcccgggccagtga 



^iSiicgcagcatgaaaagcggctccttcgcgatcggtctggcaatgat 
qctcgccccgatggtggccgcgcccggtcttgcggccgcagacccggcca 
dqcaqccggtggattatcaacagatcaccgacgtcgtgatcgcgcgcggg 
ctqtcgcagcgcggcgtgccgttctcctgggccggcggcggcatcagcgg 
ccccacgcgcggcaccggtaccggcatcaacaccgtcgggttcgacgcct 
ccqqtttgatccagtacgcctatgccggtgccgggctaaagctgccgcgt 
tcttccggccagatgtacaaggttgggcaaaaggtcctgccgcagcaagc 
qcqcaagggcgacctgatcttctacggccccgaaggcacgcaaagcgtcg 
cqttatacctcgggaagggccagatgctggaggtgggcgacgtcgtccag 
qtttcqccggtgcgcaccaacggcatgacgccttacctggtccgggttct 
cqqqacccagccgacgcccgtccaacaggcgccggtccagccagcgccgg 
tccagcaagcgcccgtccagcaagcgcccgtccaacaggcgcccgtccaa 
caggcgccggtccaacaggcgccggtccagcaagcgcccgtccagcaagc 
gcccgtccagccgcctcccttcggcaccgcgcgctcacgctaa 



atgttcactcgccgtttcgccgcctccatggttggcaccaccttgactgc 
cgctactttgggcctggccgcactcggcttcgccgggaccgccagcgcaa 
gctcgaccgacgaagcgttcctcgcgcagctgcaggcggacgggatcact 
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FIG. 2 (continued) 

ccgccgagcgcagcgcgcgccatcaaggacgcgcacgccgtctgcgacgc 
cctcgacgagggtcactcggccaaagcggtcatcaaggcggtggccaagg 
cgaccggtctgagcgccaagggcgccaagacgttcgccgttgacgccgcg 
tcggcctactgcccgcagtacgtgacctcgagctaa 

Sogctgccatgtggcgccgcagaccgttgagctcggcgctgctgtcctt 

cqqgttgctgctcggcggactgcccctagcagcgcccccgttggccggcg 

cqactgaagaacccggcgccggccaaaccccgggtgcgccggtcgtggcg 

ccqcaacagagttggaacagctgccgcgagttcatcgccgacaccagcga 

aattcgcactgcacgctgcgcgacggtgtccgtccccgtcgactacgacc 

aacccggtgggacacaagcgaagttggcggtgatccgcgtccccgcgacg 

aqacaacgattcggagcactgctggtcaatcctgggggacccggggcgtc 

Sacagtcgacatggtcgccgctatggcacccgcgatcgccgacaccgaca 

ttctccgccacttcgacctggtgggcttcgacccgagaggggtcggccac 

tcqacccctgcgttgcggtgtcgcaccgacgccgagttcgacgcgtaccg 

qcqcqatccgatggccgactacagtccggccggtgtcacccacgtcgaac 

Igitctaccggcagttggcccaggactgtgttgaccggatgggcttcagc 

ttcttggccaatatcggtaccgcgtccgtcgcacgggacatggacatggt 

tcgccaagcgttaggtgacgatcagatcaactacctcggatacagctacg 

acaccgagttgggcaccgcttacctggaacggttcggtactcatgtgcgg 

Icgatggtcctcgacggcgctatcgatccagccgttagcccaatcgagga 

IScatcagccaaatggcgggatttcagaccgctttcaatgactacgccg 

ccqactgcgcccgctcgccggcctgccctctgggcaccgactcggcccag 

?qgqtclaccgctaccacgccctggttgacccgctggtgcagaagccggg 

t ||= acgtcgg atccacgtggcctgagctacgccgacgcgacgacgggca 

ccatcaacgcgctgtacagccctcagcgctggaagtacctgaccagtggt 

c?qctgggicigcagcgcggcagcgacgccggcgacttgctggtgcttgc 

cqacglltatgacggccgggatgcagacgggcactacagcaacgaccagg 

acgcl^tcaalgclgtccggtgcgtcgatgcgcccacaccggccgatcca 

qcqqcctgggtggccgccgaccaacggatccgtcaggtcgccccgttcct 

?ap2aclggclltt?accggatccgccccccgcgatctgtgcgcgctgt 

qqccggtgccggcaacgtcgacgccgcaccccgcggcgccggccggggct 

oaraaaatcqtcgtggtgtccaccacccacgacccggccactccgtatca 

Itccgllg^gacctlglccgccagctgggcgcaccgctgatcaccttcg 

acggilllcalcacactgcggtgttcgatggcaaccagtgtgtggactct 

gcggtgatgcactattttctcgacgggaccttgccgccgacgagtctgcg 

gtgcgcgccctga 

Sfafiacaggcaccgcgacgacgcggcgcaggctgttggcagtactgat 
Sg?cc?cgci?tgccgggggccgccgttgcgctgctggccgaaccatcag 

ctqcacglgctttcgcaaccgctgaccgatctttcgactcggtgctcgct 
gccgatcagcggcctgcaggcgatcggtttgatgcaggcggtgcagggcg 

cccgccggtag 
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FIG. 2 (continued) 

atQtctcqgctgagttccatcctgcgtgccggcgcggcatttctggttct 
cqqcatcgccgctgcgacatttccacaaagcgcggcagccgactccacgg 
aaaactttccaatacctcgccggatgatcgcaaccacctgcgacgccgaa 
caatatctggcggcggtgcgggataccagtccggtgtactaccagcggta 
catqatcgacttcaacaaccatgcaaaccttcagcaagcgacgatcaaca 
aQQcgcactggttcttctcgctgtcaccggcggagcgccgagactactcc 
glacacttttacaatggcgatccgctgacgtttgcctgggtcaatcacat 
qaaaatcttcttcaacaacaagggcgtcgtcgctaaagggaccgaggtgt 
gcaatggatacccagccggcgacatgtcggtgtggaactgggcctaa 

SESSHaagcgcacaataactcccatgacgtcgatgggtgatctcttgggacctgagcca 

2tiS?q??gcItggcgacagcgacgccgaagcggagctgcttgccaacgaaagtccgagc 

atcqtlgcggccllglatccgtcggcgtcggtcgcctgggcggtgctcgccgaaggggcg 

??aaccqJcqacaagaccgtcacggcctacgcatacgcgcgtaccgggtaccaccgcggc 

ctilac?agSgcgccgccatgg??ggaagggc t tcggcccggtgccgtattcc 

cclllccglggttlccHacggtgtgtggcggcgctggcgcgcgccgcagccgctatcggc 

gSgaccgScglgtatggacgctgcctggatctgcttgacgactgtgaccccgcggcccgt 

ccggcgcttgggctc 

StSStfatccctgacatcaatctgctgctctacgcggtcatcaccggattcccgcagcac 
SaaSacqcqcStgcgtggtggcaSgacaccgtcaacggccacacccgtatcgggctgacg 
?!?c?qqci??g?tlgigtlictacggatcgccaccagtgcccgcgtgc t cgccgcgcca 
c ?accSlc?gcigatilgatcgcctatgtgcgcgagtggctttcgcagccgaacgtggac 
^ac?SIcqqcSgtIcicgc?acctggacatcgcgttgggcctgctcgacaagctcggc 
Scaacc;q?Ia§l?aaccalcgatgtgcaactggccgcctacggcatcgaatacgacgcc 
gagltcS^ScSgtgacacclactttgcccgattcgccgatctgaagtggaccgacccg 

ttgcgcgaa 

SScToatccgcgccacaccgrttcgaatcgctgtcggagctaccgcgctcggcgtgtcg 
^?^o5aScSac?ctgccggcctgctccgcacacagcgggccgggttctccccccagt 




acccaaatLaua **Z. a rni-raataa ttttqqtatctcccttaaaatcggaagcgtcgac 
fSccSg;?Iccc^;iagicgStIlgtccllaac,„oa tca aaac.acca gg ca gg gc 

aagagttacacactgaccgggacgggtcacgcgc 
gagctgccgttcggggtacatgtaacctgtccg 
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